Publication Details

SELECT * FROM publications WHERE Record_Number=11132
Reference TypeThesis
Author(s)Thai, H. L.
TitleDeep Reinforcement Learning for POMDPs
Journal/Conference/Book TitleMaster Thesis
AbstractDeep reinforcement learning has made a big impact in recent years by achieving human level game play in the Atari 2600 games just using images as input and by learning robot control policies end-to-end from images to motor signals - tasks, which were previously intractable for classical reinforcement learning. For learning these control policies based on deep neural networks policy search methods, such as deep deterministic policy gradient (Lillicrap et al., 2015) and especially trust region policy optimization (Schulman et al., 2015), have been very successful, also in transferring insights and past approaches of classical reinforcement learning to deep neural networks. In this thesis we build up this previous work and derive a new algorithm for deep neural networks, called compatible policy search (COPS), based on the idea of the natural gradient, compatible value function approximation, entropy regularization and relative entropy policy search (Peters et al., 2010). In our experiment we investigated the capability of COPS and other policy search methods in challenging partially observable environments: RockSample, where the agent needs to take information gathering into account and Pocman, a large scale partially observable Markov decision processes (POMDP) with ca. 1056 underlying states. We present results where COPS outperforms all other six policy search methods and where the additional entropy regularization constraint, bounding the entropy of the new policy, is essential for exploration and for nding a good policy in these partially observable environments. Furthermore, to encourage additional exploration in these partially observable environments we will propose a factored context tree switching model for POMDPs, which we use for a pseudocount based exploration bonus and leads to additional performance gains.
Link to PDF


zum Seitenanfang