Overview

The Natural Actor-Critic is a policy-gradient based actor-critic architecture for reinforcement learning which employs both Natural Policy Gradients as well as the compatible function approximation. It has been described as "blazingly fast" (David Wingate from MIT at NIPS 2007), as "the current method of choice" (Douglas Aberdeen from NICTA at MLSS 2006) and as "a great algorithm for improving upon imitations" (Aude Billard from EPFL at NIPS 2007). For more information, see

Peters, J.; Schaal, S. (2008). Natural Actor-Critic, Neurocomputing, 71, 7-9, pp.1180-1190. [pdf]

Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, pp.682–697. [pdf]

Peters, J.;Schaal, S. (2006). Policy gradient methods for robotics, Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS 2006). [pdf]

Peters, J.;Vijayakumar, S.;Schaal, S. (2005). Natural Actor-Critic, in: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L. (eds.), Proceedings of the 16th European Conference on Machine Learning (ECML 2005), 3720, pp.280-291, Springer. [pdf]

Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Reinforcement learning for humanoid robotics, Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots. [pdf]

These publications should be sufficient for understanding this architecture.

Follow-up Work by Other Researchers

There has been some follow-up work on the Natural Actor-Critic by other researchers and I would like to point these papers out to the interested reader:

Australia

Aberdeen D., Buffet O., Thomas O., Policy-Gradients for PSRs and POMDPs. Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS'07), San Juan, Puerto Rico, 2007. [pdf]

Silvia Richter, Doug Aberdeen and Jin Yu. Natural Actor-Critic for Road Traffic Optimisation. Neural Information Processing Systems (NIPS) Conference, pp. 1169-1176, 2006, Vancouver, B.C., Canada.

Xinhua Zhang, Douglas Aberdeen, S. V. N. Vishwanathan: Conditional random fields for multi-agent reinforcement learning. ICML 2007: 1143-1150

France

Sertan Girgin and Philippe Preux, Basis Expansion in Natural Actor Critic Methods, EWRL 2008

Buffet O., Dutech A., Charpillet F., Shaping Multi-Agent Systems with Gradient Reinforcement Learning. Autonomous Agents and Multi-Agent Systems Journal, volume 15(2), 2007, pp 197-220.

Germany

Verena Heidrich-Meisner and Christian Igel, Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem, EWRL 2008

Silvia Richter. Learning Road Traffic Control -- Towards Practical Traffic Control Using Policy Gradients. Diploma Thesis, University of Freiburg, 2006.

Japan

Yutaka Nakamura, Takeshi Mori, Masa-aki Sato and Shin Ishii. (2007). Reinforcement Learning for a Biped Robot Based on a CPG-Actor-Critic Method. Neural Networks, 20(6), pp.723-735.

Y. Nakamura, T. Mori, Y. Tokita, T. Shibata, & S. Ishii. (2005) Off-policy natural policy gradient method for a biped walking using a CPG controller. Journal of Robotics and Mechatronics, 17(6), pp. 636--644.

Y. Nakamura, T. Mori, & S. Ishii. (2006) Natural policy gradient reinforcement learning method for a looper-like robot. International Symposium on Artificial Life and Robotics (AROB 11th '06), GS3-3.

Y. Nakamura, T. Mori, & S. Ishii. (2005) An off-policy natural gradient method for a partial observable Markov decision process. International Conference on Artificial Neural Networks (ICANN 2005), Lecture Notes in Computer Science, 3697, pp.431-436.

Y. Nakamura, T. Mori, and S. Ishii. (2004) Natural policy gradient reinforcement learning for a CPG control of a biped robot. International conference on parallel problem solving from nature (PPSN VIII), LNCS 3242, pp.972--981.

Tsuyoshi Ueno, Motoaki Kawanabe, Takeshi Mori, Shin-ichi Maeda and Shin Ishii. (2008). A semiparametric statistical approach to model-free policy evaluation. International Conference on Machine Learning (ICML).

T. Ueno, Y. Nakamura, T. Shibata, K. Hosoda, & S. Ishii. (2006) Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Tetsuro Morimura, Eiji Uchibe, Junichiro Yoshimoto, Kenji Doya A New Natural Policy Gradient by Stationary Distribution Metric, ECML (2008)

Morimura T, Uchibe E, Doya K (2008). Natural policy gradient with baseline adjustment function for variance reduction. Artificial Life and Robotics (AROB 08), B-Con Plaza, Beppu, Oita, JAPAN.

Morimura T, Uchibe E, Doya K (2005). Utilizing natural gadient in temporal difference reinforcement learning with eligibility traces, 2nd International Symposium on Information Geometry and its Applications

Makino, K., Nakamura, Y., Shibata, T., Ishii, S. (2008) Adaptive Control of a Looper-like Robot based on the CPG-Actor-Critic Method. Artificial Life and Robotics, 12(1-2) ,pp.129-132.

Makino, K., Nakamura, Y., Shibata, T., & Ishii, S. (2007) Adaptive Control of a Looper-like Robot Based on the CPG-Actor-Critic Method. International Symposium on Artificial Life and Robotics (AROB 12th '07), GS24-2.

Y. Tokita, J. Yoshimoto, Y. Nakamura, & S. Ishii. (2006) Reinforcement learning of switching multiple controllers to control a real robot. International Symposium on Artificial Life and Robotics (AROB 11th '06), GS22-3.

Takeshi Mori, Yutaka Nakamura and Shin Ishii. (2005). Efficient Sample Reuse by Off-policy Natural Actor-critic Learning. Advances in Neural Information Processing Systems, Workshop (NIPS Workshop).

Takeshi Mori, Yutaka Nakamura and Shin Ishii. (2005). Off-Policy Natural Actor-Critic. NAIST Technical Report, 2005007.

Korea

Jooyoung Park, Jongho Kim, Daesung Kang, An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm, Proceedings of Computational Intelligence and Security: International Conference, CIS 2005, Xi’an, China, December 15-19, 2005. [pdf]

Kim, Byungchan; Kang, Byungduk; Park, Shinsuk; Kang, Sung-Chul, Learning Robot Stiffness for Contact Tasks Using the Natural Actor-Critic, ICRA 2008, Pasadena, CA.

If you think, your works should be added here, drop me a line!