Gabriel Dulac-Arnold - Challenges of Real-world RL: Definition, Implementation, Analysis

Date:

October 1, 2020

Author:

Hrvoje Stojic

Challenges of Real-world RL: Definition, Implementation, Analysis



Abstract

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real-world problems. Offline learning is a key part of making reinforcement learning (RL) useable in real systems. Offline RL looks at scenarios where there is data from a system’s operation, but no direct access to the system when learning a policy. Recent work on training RL policies from offline data has shown results both with model-free policies learned directly from the data, or with planning on top of learnt models of the data. Model-free policies tend to be more performant, but are more opaque, harder to command externally, and less easy to integrate into larger systems. We propose an offline learner that generates a model that can be used to control the system directly through planning. This allows us to have easily controllable policies directly from data, without ever interacting with the system. We show the performance of our algorithm, Model-Based Offline Planning (MBOP) on a series of robotics-inspired tasks, and demonstrate its ability to leverage planning to respect environmental constraints. We are able to find near-optimal policies for certain simulated systems from as little as 50 seconds of real-time system interaction, and create zero-hot goal-conditioned policies on a series of environments.


Notes


  • An arXiv pre-print is available here.​​

  • Dr Gabriel Dulac-Arnold is a researcher at Google Research. His publication record on Google Scholar can be found here, personal website here, and on Twitter you can find him at @gabepsilon​.

Share on social media

Share on social media

Share on social media

Share on social media

Related Seminars

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Mickael Binois - Leveraging replication in active learning

We were recently joined by Mickael Binois, to talk about 'Leveraging replication in active learning'.

Jun 24, 2024

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Ilija Bogunovic - From Data to Confident Decisions

We were recently joined by Ilija Bogunovic, to talk about 'Robust and Efficient Algorithmic Decision Making'.

Jun 13, 2024

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Dario Azzimonti - Preference learning with Gaussian processes

We were recently joined by Dario Azzimonti, to talk about 'Preference learning with Gaussian processes'.

May 23, 2024

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024

Mojmír Mutný - Optimal Experiment Design in Markov Chains

We were recently joined by Mojmír Mutný (ETH Zurich), to talk about 'Optimal Experiment Design in Markov Chains'.

Mar 28, 2024