Publication Details

SELECT * FROM publications WHERE Record_Number=11262
Reference TypeThesis
Author(s)Saoud, H.
TitleImproving Sample-Efficiency with a Model-Based Deterministic Policy Gradient
Journal/Conference/Book TitleMaster Thesis
AbstractIn recent years we have seen enormous advances in the field of Artificial Intelligence (AI) and Machine Learning (ML). With the development of deep learning techniques, ML has been able to solve unexpected complex tasks; often surpassing human performance. Examples include image classification, automatic text generation, sentiment analysis and text to speech. Reinforcement Learning (RL) is a sub-field of ML where the goal is to discover the best way for an agent to interact with an environment; obtaining the highest reward possible. RL recently solved some very difficult and impressive tasks such as GO (a complex board game), Atari games, Starcraft, just to mention a few. However most of these tasks can be simulated, and the learning technique used requires billions of data points, resulting in several years of interaction with the system that could be shrunk down to several hours using fast simulation and massive parallel computation. Some tasks, such as robotics or real systems, can hardly be parallelized or simulated with precision, implying the failure of RL for these kind of tasks. Model based RL promise to reduce the data-complexity of model-free RL, requiring less interaction with the system. Moreover, learning a model of the system can also ensure safety (enabling the avoidance of some unsafe state), enable a transfer learning (the model can be used for solving similar tasks) and enabling the generation of synthetic data. For these reasons we are interested in developing a model based RL (MBRL) algorithm. In this thesis, we will focus on uncertainty with respect to the model and use it to obtain both safe and efficient exploration.


zum Seitenanfang