Reference TypeReport
Author(s)Tosatto, S.; D'Eramo, C.; Pajarinen, J.; Restelli, M.; Peters, J.
TitleTechnical Report: "Exploration Driven by an Optimistic Bellman Equation"
AbstractThis technical report contains proofs and technical details regarding "Exploration Driven by an Optimistic Bellman Equation". More in detail it contains a derivation of the Optimistic Bellman Equation (OBE) from an entropic-regularization principle; a technical definition of Optimistic Value Iteration (OVI) and Optimistic Q-Learning (OQL); convergence proofs of OVI and OQL; a derivation of the implicitly defined exploration bonus and its properties in the tabular case. Regarding the empirical analysis, we provide a detailed list of the hyper-parameters used, as well as a list of empirical evaluation under a different combinations of hyper-parameters.
