|Organizers:||Pieter Abbeel (University of California in Berkeley) , Jan Peters (Technische Universitaet Darmstadt )|
|Date and Time:||Friday, 14 May 2012, 0900-1530|
|Location:||Room 11, St. Paul River Centre|
This all-day tutorial introduces the audience to reinforcement learning. Prior experience in this area is not assumed. In the first half of this tutorial we will cover the foundations of reinforcement learning: Markov decision processes, value iteration, policy iteration, linear programming for solving an MDP, function approximation, model-free versus model-based learning, Q-learning, TD-learning, policy search, the likelihood ratio policy gradient, the policy gradient theorem, actor-critic, natural gradient and importance sampling. In the second half of this tutorial we will discuss example success stories and open problems.
|2||9:10||Jan Peters||Background: Supervised Learning||Attach:Part2_Background_Supervised_Learning.pdf|
|3a||9:35||Pieter Abbeel||Optimal Control: Foundations||Attach:Part_3a_Optimal_Control.pdf|
|3b||10:30||Jan Peters||Optimal Control with Learned Forward Models||Attach:Part3b_Optimal_control_with_Learned_Models.pdf|
|4||10:55||Pieter Abbeel||Value Function Methods||Attach:Part4_Value_Function_Methods.pdf|
|5||14:00 (2pm)||Jan Peters||Policy Search Methods||Attach:Part5_Policy_Search.pdf|
|6||14:55 (2:55pm)||Pieter Abbeel||Exploration in Reinforcement Learning||Attach:Part6_Exploration.pdf|
|7||15:10 (3:10)||Both||Wrap-Up and Conclusion||||
Pieter Abbeel received a BS/MS in Electrical Engineering from KU Leuven (Belgium) and received his Ph.D. degree in Computer Science from Stanford University in 2008. He joined the faculty at UC Berkeley in Fall 2008, with an appointment in the Department of Electrical Engineering and Computer Sciences.
He has won various awards, including best paper awards at ICML and ICRA, the Sloan Fellowship, the Air Force Office of Scientific Research Young Investigator Program (AFOSR-YIP) award, the Okawa Foundation award, and the 2011's TR35. He has developed apprenticeship learning algorithms which have enabled advanced helicopter aerobatics, including maneuvers such as tic-tocs, chaos and auto-rotation, which only exceptional human pilots can perform. His group has also enabled the first end-to-end completion of reliably picking up a crumpled laundry article and folding it. His work has been featured in many popular press outlets, including BBC, New York Times, MIT Technology Review, Discovery Channel, SmartPlanet and Wired. His current research focuses on robotics and machine learning with a particular focus on challenges in personal robotics, surgical robotics and connectomics.