Publication Details

SELECT * FROM publications WHERE Record_Number=11297
Reference TypeThesis
Author(s)Laux, M.
TitleDeep Adversarialreinforcement Learning for Object Disentangling
Journal/Conference/Book TitleMaster Thesis
AbstractDeep learning in combination with improved training techniques and high computational power have led to recent advances in the field of reinforcement learning. Deep reinforcement learning methods have been used to teach agents how to play Go and Atari video games with human level performance. However, modern RL methods often fail to generaliset o environments that differ from the one they were trained in or to transfer from simulation to real-world applications.These problems are caused by the fact that it is often impossible to train an agent in the entire environment, it is intended to be deployed in. For example, autonomous cars can not be trained for every possible scenario it might ever possibly encounter. Thus, generalisation from training to test scenarios often fails due to overfitting and a lack of exploration.In this thesis, we present the adversarial reinforcement learning(ARL) framework, which utilises an adversarial agent,the adversary, which is trained to steer the original agent, the protagonist, to unknown and difficult states. The the protagonist and the adversary are trained jointly in order to allow them to adapt to the changing policy of their respective opponent. We show that our method is able to generalise by training an end-to-end system for robot control to solve the challenging object disentangling task for robotic arms. We perform an ablation study to investigate the effects ofour method’s hyperparameters, which shows that the our method is robust to changes to most hyperparameters. Our extensive experiments demonstrate that our method is indeed able to learn more robust policies that improve generalisation from training in simulation to both modified test scenarios as well as real world environments. As learning adversarial conditions may lead to a sparse distribution of positive rewards, we additionally propose a new form of prioritised experience replay for off-policy RL algotihms,advantage-based experience replay(ABER). This novel method of prioritising samples that are likely to increase the current policy’s performance can increase learning speed and thereby reduce sample complexity.
Link to PDF


zum Seitenanfang