Publication Details

SELECT * FROM publications WHERE Record_Number=11194
Reference TypeConference Proceedings
Author(s)Tosatto, S.; D'Eramo, C.; Pajarinen, J.; Restelli, M.; Peters, J.
TitleExploration Driven By an Optimistic Bellman Equation
Journal/Conference/Book TitleProceedings of the International Joint Conference on Neural Networks (IJCNN)
Keywordsexploration; reinforcement learning; intrinsic motivation; Bosch-Forschungstiftung
AbstractExploring high-dimensional state spaces and finding sparse rewards are central problems in reinforcement learning. Exploration strategies are frequently either naı̈ve (e.g., simplistic epsilon-greedy or Boltzmann policies), intractable (i.e., full Bayesian treatment of reinforcement learning) or rely heavily on heuristics. The lack of a tractable but principled exploration approach unnecessarily complicates the application of reinforcement learning to a broader range of problems. Efficient exploration can be accomplished by relying on the uncertainty of the state-action value function. To obtain the uncertainty, we maintain an ensemble of value function estimates and present an optimistic Bellman equation (OBE) for such ensembles. This OBE is derived from a relative entropy maximization principle and yields an implicit exploration bonus resulting in improved exploration during action selection. The implied exploration bonus can be seen as a well-principled type of intrinsic motivation and exhibits favorable theoretical properties. OBE can be applied to a wide range of algorithms. We propose two algorithms as an application of the principle: Optimistic Q-learning and Optimistic DQN which outperform comparison methods on standard benchmarks.
Link to PDF


zum Seitenanfang