Publication Details

SELECT * FROM publications WHERE Record_Number=10787
Reference TypeThesis
Author(s)Arenz, O.
TitleFeature Extraction for Inverse Reinforcement Learning
Journal/Conference/Book TitleMaster Thesis
KeywordsInverse Reinforcement Learning
AbstractEquipping an agent with the ability to infer intentions behind observed behavior is a prerequisite for cre- ating truly autonomous robots. By deducing the purpose of the actions of other agents the robot would be able to react in a sensible way and, furthermore, to imitate the strategy on a high level. Inverse Rein- forcement Learning (IRL) can be applied to learn a reward function that is consistent with the observed behavior and is, thus, a step towards this overall goal. Some strategies can only be modeled properly by underlying a time-dependent reward function. Max- imum Causal Entropy Inverse Reinforcement Learning (MaxCausalEnt-IRL) is a method that can be applied to learn such non-stationary functions. However, it depends on gradient-based optimization and its performance can therefore suffer if too many parameters have to be learned. This can be problematic, since the number of parameters increases significantly if separate reward functions are learned for each time step. Furthermore, since only few time steps might be relevant for the observed task, a second challenge of applying IRL for learning non-stationary reward functions consists in properly extracting such sparseness. This thesis investigates how to meet these practical requirements. A novel approach, IRL-MSD, is developed for that purpose. Unlike some previous IRL methods, Inverse Reinforcement Learning by Matching State Distributions (IRL-MSD) does not aim to match feature counts but instead learns a reward function by matching state distributions. This approach has two interesting properties. Firstly, the features do not have to be defined explicitly but arise naturally from the structure of the observed state distribution. Secondly, it does not require gradient-based optimization. The experiments show that it converges faster than existing IRL methods and properly recovers the goals of a sparse reward function. Therefore, IRL-MSD suggests itself for future research.
Link to PDF


zum Seitenanfang