Publication Details

SELECT * FROM publications WHERE Record_Number=11153
Reference TypeThesis
Author(s)Treede, F.
TitleLearning Robust Control Policies from Simulations with Perturbed Parameters
Journal/Conference/Book TitleMaster Thesis
KeywordsDomain Randomization, Perturbed Simulations, Robot Reinforcement Learning
AbstractDeep reinforcement learning has a lot of potential to solve robotics motor tasks. However, deep learning algorithms suffer from a high sample complexity, thus training on the real robot is unfeasible. Training in simulated environments has been successful, but transferring the learned policies to the real robot is difficult. Recent research has focused on learning robust policies which can be transferred successfully. In this thesis, we developed a framework to learn robust policies and evaluate them on environments with varied physics parameters. The simulation was implemented using the Rcs framework, developed by Honda Research Institute Europe, which can control both simulated and real-world robots. The applied learning algorithms are based on the rllab framework. In order to allow using Rcs-based environments, we wrote RcsPySim as a bridge. Throughout the thesis, we used RcsPySim to evaluate and compare the robust learning algorithms Ensemble Policy Optimization (EPOpt) and Robust Adversarial Reinforcement Learning (RARL) with each other and with a baseline policy trained using the standard policy optimization algorithm Trust Region Policy Optimization (TRPO). The policies were trained and evaluated on the ball-on-plate task, where a robot has to stabilize the ball at the center of the plate which is mounted on the robot’s end-effector. We varied different physics parameters in order to analyze the robustness of the learned policies against these changes. Furthermore, we performed a sensitivity analysis of the parameters. The obtained results show that the relevant physics parameters for the ball-on-plate task are the balls friction properties and mass distribution, whereas the ball’s mass and radius do not have a significant influence. Moreover, we observed that the baseline policy, trained solely on the nominal physics world, is already quite robust. Both RARL and EPOpt increase the robustness for some parameter ranges, but reduce it in others. EPOpt does not perform as well as expected. Generally, EPOpt always prefers the more cautious approach, which means it can deal better with more unstable simulations, but it will not be able to solve the task in a setup with strong friction. The solution trained by RARL is more aggressive, making it well suited for solving cases with higher friction, but less attractive for more unstable environments with lower friction. All laerned policies could be transferred to the real world. The policy learned by EPOpt should be preferred as it is the most cautious. Since EPOpt doesn’t work well with friction values higher then the nominal parameters, choosing their values to be higher then the measured mean would likely increase the robustness over the whole parameter space.
Link to PDF
Last Modified Date28.06.2018


zum Seitenanfang