Robotics, Machine Learning, Reinforcement Learning, Partial Observability
Mail. Samuele Tosatto
TU Darmstadt, Fachgebiet IAS
Office. Room E325,
In the recent years we saw an important technology advancement in the field of artificial intelligent, mainly devoted to improvements in the field of machine learning, which has become nowadays a very active field both in the perspective of research and industry. This technological improvement had an impact in the way we program robots: instead of an ad-hoc programming to accomplish a given task, it is now preferred to make the robot learn how to solve such task. This former approach has a lot of benefits: it relives the programmer to implement difficult solutions, it could find a very optimal way to solve the task, just to mention a few. Reinforcement leanring is a machine learning technique which aims to solve sequential decision problems by learning. Despite the recent advances in the field, especially in solving games and simulated environment, often the direct application of reinforcement learning to real-robotic task is unsatisfactory and leads to poor results.
The main reason is that online methods are very sample inefficient, meaning they require an unreasonable amount of interaction with the environment, which is often intractable for real systems. Moreover, to enable safety, we want to use a different policy than the optimization one.
Off-policy techniques tries to overcome the issues presented, but the current state of the art often fails to provide a true off-policy estimation. Samuele worked out an analytical off-policy estimation of the gradient which results to be more accurate than the current state off-the art.
Samuele also focused his research on the problem of exploration under sparse reward, so to obtain a sample-efficient way to explore with a low-informative reward signal.
He is focusing now to bring his findings closer to real application and robotics.
Reinforcement Learning, Off-Policy Gradient Estimation, Exploration, Risk-Awareness
Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).
See Details [Details] Download Article [PDF] BibTeX Reference [BibTex]
Tosatto, S.; D'Eramo, C.; Pajarinen, J.; Restelli, M.; Peters, J. (2019). Exploration Driven By an Optimistic Bellman Equation, Proceedings of the International Joint Conference on Neural Networks (IJCNN). See Details [Details] Download Article [PDF] BibTeX Reference [BibTex]
Tosatto, S.; D'Eramo, C.; Pajarinen, J.; Restelli, M.; Peters, J. (2018). Technical Report:. See Details [Details] Download Article [PDF] BibTeX Reference [BibTex]
Tosatto, S.; D'Eramo, C.; Pirotta, M.; Restelli, M. (2017). Boosted Fitted Q-Iteration, Polytechnic University of Milan. See Details [Details] Download Article [PDF] BibTeX Reference [BibTex]
Tosatto, S.; Pirotta, M.; D'Eramo, C; Restelli, M. (2017). Boosted Fitted Q-Iteration, Proceedings of the International Conference of Machine Learning (ICML). See Details [Details] Download Article [PDF] BibTeX Reference [BibTex]
Rueckert, E.; Nakatenus, M.; Tosatto, S.; Peters, J. (2017). Learning Inverse Dynamics Models in O(n) time with LSTM networks, Proceedings of the International Conference on Humanoid Robots (HUMANOIDS). See Details [Details] Download Article [PDF] BibTeX Reference [BibTex]