Main page

Bridging the Gap: Task Design and Data Efficiency for Broader RL Adoption

Speaker: Pierre-Luc Bacon, University of Montreal
Time: Tuesday, Feb 20, 2024, 10:00AM - 11:00 AM, Eastern Time
Zoom Link: contact ymao@uottawa.ca

Abstract:

While traditional RL excels at solving pre-defined problems, it often neglects a crucial first step: specifying real-world tasks that involve complex environments and human preferences. This challenge is further amplified by real-world scenarios where extensive online data collection is costly or infeasible, but rich historical observations (e.g., sensor readings) exist. Current RL methods, however, struggle to effectively utilize and generalize from such data, especially in unseen situations.

This talk presents my group's recent advances on these two fronts. First, I will introduce the "Motif" methodology, which facilitates task specification by distilling knowledge from Large Language Models (LLMs) into intrinsic rewards. Second, I will discuss how these advances connect to the important question of representation learning in RL. This includes our work on disentangling memory and credit assignment in Transformer architectures, as well as developing sample-efficient methods for learning approximate information states for Partially Observable Markov Decision Processes (POMDPs) using sequence models.

Through these advancements in task specification and data efficiency, we pave the way for broader real-world applications of RL, moving beyond the confines of academic benchmarks.

Speaker's Bio

Pierre-Luc Bacon is an Assistant Professor at the University of Montreal in the Department of Operations Research and Computer Science. He holds a CIFAR AI Chair at Mila, the Quebec Institute in AI. His research focuses on deep reinforcement learning and how to scale these methods up, particularly in the face of the curse of dimensionality. Driven by the goal of making RL more practical, he explores applications in areas like HVAC control and drug discovery.