I am now a Post-Doc at University of Alberta.
Robotics, Machine Learning, Reinforcement Learning, Partial Observability
Mail. Samuele Tosatto
In the recent years we saw an important technology advancement in the field of artificial intelligent, mainly devoted to improvements in the field of machine learning, which has become nowadays a very active field both in the perspective of research and industry. This technological improvement had an impact in the way we program robots: instead of an ad-hoc programming to accomplish a given task, it is now preferred to make the robot learn how to solve such task. This former approach has a lot of benefits: it relives the programmer to implement difficult solutions, it could find a very optimal way to solve the task, just to mention a few. Reinforcement leanring is a machine learning technique which aims to solve sequential decision problems by learning. Despite the recent advances in the field, especially in solving games and simulated environment, often the direct application of reinforcement learning to real-robotic task is unsatisfactory and leads to poor results.
The main reason is that online methods are very sample inefficient, meaning they require an unreasonable amount of interaction with the environment, which is often intractable for real systems. Moreover, to enable safety, we want to use a different policy than the optimization one.
Off-policy techniques tries to overcome the issues presented, but the current state of the art often fails to provide a true off-policy estimation. Samuele worked out an analytical off-policy estimation of the gradient which results to be more accurate than the current state off-the art.
Samuele also focused his research on the problem of exploration under sparse reward, so to obtain a sample-efficient way to explore with a low-informative reward signal.
He is focusing now to bring his findings closer to real application and robotics.
Reinforcement Learning, Off-Policy Gradient Estimation, Exploration, Risk-Awareness