Publication Details

SELECT * FROM publications WHERE Record_Number=11204
Reference TypeConference Proceedings
Author(s)Tateo, D.; D'Eramo, C.; Nuara, A.; Bonarini, A.; Restelli, M.
TitleExploiting structure and uncertainty of Bellman updates in Markov decision processes
Journal/Conference/Book Title2017 IEEE Symposium Series on Computational Intelligence (SSCI)
AbstractIn many real-world problems stochasticity is a critical issue for the learning process. The sources of stochasticity come from the transition model, the explorative component of the policy or, even worse, from noisy observations of the reward function. For a finite number of samples, traditional Reinforcement Learning (RL) methods provide biased estimates of the action-value function possibly leading to poor estimates, then propagated by the application of the Bellman operator. While some approaches assume that the estimation bias is the key problem in the learning process, we show that in some cases this assumption does not necessarily hold. We propose a method that exploits the structure of the Bellman update and the uncertainty of the estimation in order to better use the amount of information provided by the samples. We show theoretical considerations about this method and its relation w.r.t. Q-Learning. Moreover, we test it in environments available in literature in order to demonstrate its effectiveness against other algorithms that focus on bias and sample-efficiency.
Link to PDF


zum Seitenanfang