Reinforcement learning is from our perspective the automatic design of approximately optimal controllers from measurements. In traditional (optimal) control, the smart human in the loop decides how to measure and model the system. In RL, on the other hand, the optimal controller is constructed by the RL system directly from measurements; however, the way to the optimal controller can require extensive prestructuring through structured policies, value functions or models. In this page, I want to list some of the projects I am working on or have worked on but this list will always be fairly incomplete.
Our general goal in reinforcement learning is the development of methods which scale into the dimensionality of humanoid robots and can generate actions for seven or more degrees of freedom, e.g., for a human arm.
Such problems are a tremendous challenge for reinforcement learning as they require a state space of 21 or more dimensions (one dimension for each joint position, velocity and acceleration) and an action space of seven dimensions.
While supervised statistical learning techniques have significant applications for model and imitation learning, they do not suffice for all motor learning problems, particularly when no expert teacher or idealized desired behavior is available. Thus, both robotics and the understanding of human motor control require reward (or cost) related self-improvement. The development of efficient reinforcement learning methods is therefore essential for the success of learning in motor control.
However, reinforcement learning in high-dimensional spaces such as manipulator and humanoid robotics is extremely difficult as a complete exploration of the underlying state-action spaces is impossible and few existing techniques scale into this domain.
Nevertheless, it is obvious that humans also never need such an extensive exploration in order to learn new motor skills and instead rely upon a combination of both watching a teacher and subsequent self-improvement. In more technical terms: first, a control policy is obtained by imitation and then improved using reinforcement learning. It is essential that only local policy search techniques, e.g., policy gradient methods, are applied as a rapid change to the policy would result into a complete unlearning of the policy and might also result into unstable control policies which can damage the robot.
In order to bring reinforcement learning to robotics and computational motor control, we have both improved existing reinforcement learning methods as well as developed a variety of novel algorithms. At this point, we can only give a short overview of these methods:
Hachiya,H.; Akiyama, T.; Sugiyama, M.; Peters, J. (2009). Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning, Neural Networks, 22, 10, pp.1399-1410 download
Peters, J.; Kober, J.; Nguyen-Tuong, D. (2008). Policy Learning – a unified perspective with applications in robotics, Proceedings of the European Workshop on Reinforcement Learning (EWRL) download
Peters, J.;Schaal, S. (2008). Natural actor critic, Neurocomputing, 71, 7-9, pp.1180-1190 download
Peters, J.;Schaal, S. (2008). Learning to control in operational space, International Journal of Robotics Research, 27, pp.197-212 download
Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, pp.682-97 download
Peters, J., Schaal, S. (2007). Policy Learning for Motor Skills, Proceedings of 14th International Conference on Neural Information Processing (ICONIP) download
Wierstra, D.; Foerster, A.; Peters, J.; Schmidhuber, J. (2007). Solving Deep Memory POMDPs with Recurrent Policy Gradients, Proceedings of the International Conference on Artificial Neural Networks (ICANN) download
Theodorou, E; Peters, J.; Schaal, S. (2007). Reinforcement Learning for Optimal Control of Arm Movements, Abstracts of the 37st Meeting of the Society of Neuroscience.
Peters, J. (2007). Machine Learning of Motor Skills for Robotics, Ph.D. Thesis, Department of Computer Science, University of Southern California
Peters, J.;Schaal, S. (2007). Reinforcement learning for operational space control, International Conference on Robotics and Automation (ICRA2007), pp.2111-2116 download
Peters, J.;Schaal, S. (2007). Using reward-weighted regression for reinforcement learning of task space control, Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning download
Peters, J.;Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning, Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN) download
Peters, J.;Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control, Proceedings of the International Conference on Machine Learning (ICML2007) download
Peters, J.;Theodorou, E.;Schaal, S. (2007). Policy gradient methods for machine learning, INFORMS Conference of the Applied Probability Society
Riedmiller, M.;Peters, J.;Schaal, S. (2007). Evaluation of policy gradient methods and variants on the cart-pole benchmark, Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning download
Peters, J.;Schaal, S. (2006). Reinforcement Learning for Parameterized Motor Primitives, Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN 2006) download
Peters, J.;Schaal, S. (2006). Policy gradient methods for robotics, Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS 2006) download
Peters, J.;Vijayakumar, S.;Schaal, S. (2005). Natural Actor-Critic, in: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L. (eds.), Proceedings of the 16th European Conference on Machine Learning (ECML 2005), 3720, pp.280-291, Springer download
Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Reinforcement learning for humanoid robotics, IEEE-RAS International Conference on Humanoid Robots (Humanoids2003) download
Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Scaling reinforcement learning paradigms for motor learning, Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003) download
Kwee, I.;Hutter, M.;Schmidhuber, J. (2001). Gradient-based reinforcement planning in policy-search methods, IDSIA download