Publication Details

SELECT * FROM publications WHERE Record_Number=11159
Reference TypeConference Paper
Author(s)Muratore, F.; Treede, F.; Gienger, M.; Peters, J.
TitleDomain Randomization for Simulation-Based Policy Optimization with Transferability Assessment
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Keywordsdomain randomization, simulation optimization, sim-2-real
AbstractExploration-based reinforcement learning on real robot systems is generally time-intensive and can lead to catastrophic robot failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfortunately, running policy search on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB), where the policy exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. For this reason, much work in robot reinforcement learning has focused on model-free methods that learn on real-world systems. The resulting lack of safe simulation-based policy learning techniques imposes severe limitations on the application of robot reinforcement learning. In this paper, we explore how physics simulations can be utilized for a robust policy optimization by perturbing the simulator’s parameters and training from model ensembles. We propose a new algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) that uses a biased estimator of the SOB to formulate a stopping criterion for training. We show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator that can be applied directly to a different system without using any data from the latter.
Link to PDF


zum Seitenanfang