Topics for the Autonomous Learning Systems Seminar

Die folgenden Themen werden von unterschiedlichen Betreuern angeboten. Beim Vorgespraech werden die Themen gemeinsam vergeben. Themen der eigenen Wahl - soweit im Rahmen des Seminars - sind zusaetzlich erlaubt.

DEADLINE FOR TOPIC SELECTION: Friday, November 2, 2012

A) Temporal Difference Learning, Difficulty: 3 (Betreuer: Gerhard Neumann)

R. S. Sutton; Learning to predict by the methods of temporal differences; Machine learning, 1988 [paper]
M. G. Lagoudakis & R. Parr; Least-Squares Policy Iteration; Journal of Machine Learning Research, 2003 [paper]

B) Q-Learning, Difficulty: 2 (Betreuer: Jan Peters)

C. Watkins, P. Dayan; Q-learning, Machine Learning, 8, 279-292. [paper]
S. Singh, T. Jaakkola, M. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287, 2000. [paper]

C) Batch-Mode Reinforcement Learning, Difficulty 2 (Betreuer: Gerhard Neumann)

D. Ernst and P. Geurts and L. Wehenkel and L. Littman, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, 2005, [paper]
M. Riedmiller, T. Gabel, R. Hafner, S. Lange, Reinforcement learning for robot soccer, Autonomous Robots, 2011 [paper]

D) Policy Gradient Methods, Difficulty: 4 (Betreuer: Jan Peters)

S. Kakade. A Natural Policy Gradient. In Advances in Neural Information Processing Systems 14 2002. [paper]
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems 12 (Proceedings of the 1999 conference), pp. 1057-1063. MIT Press. [paper]

E) Human Motion Analysis and Recognition, Difficulty: 3 (Betreuer: Heni Ben Amor)

R. Poppe; A survey on vision-based human action recognition. Image and Vision Computing, Volume 28, Issue 6, Pages 976-990, 2010. paper
I. Akhter, T. Simon, S. Khan, I. Matthews, and Y. Sheikh, Bilinear Spatiotemporal Basis Models, ACM Transactions on Graphics, 2012. website

F) Expectation-Maximization-Based Policy Search, Difficulty 4 (Betreuer: Gerhard Neumann)

J. Kober and J. Peters, Policy search for motor primitives in robotics, Machine Learning, 2011, [paper]
P. Dayan and G. Hinton, Using expectation-maximization for reinforcement learning, Neural Computation, 1997, [paper]

G) Inverse Reinforcement Learning with Linearly-Solvable MDPs, Difficulty: 3.5 (Betreuer: Jan)

K. Dvijotham, E. Todorov; Inverse Optimal Control with Linearly-Solvable MDPs; International Conference on Machine Learning, 2010 [paper]

H) Planning for Relational Rules, Difficulty: 3.5 (Betreuer: Abdeslam Boularias)

Tobias Lang, Marc Toussaint (2010). Planning with Noisy Probabilistic Relational Rules. Journal of Artificial Intelligence Research, 39, pp 1-49. [paper]

I) Probabilistic inference for autonomous learning, Difficulty: 2 (Betreuer: Duy Nguyen-Tuong)

D. J. C. MacKay; Probable Networks and plausible predictions � a review of practical Bayesian methods for supervised neural networks; Network: Computation in Neural Systems, 6, 469-505 [paper]
C. M. Bishop, M. E. Tipping, Bayesian regression and classification; Advances in Learning Theory: Methods, Models and Applications, Volume 190, pp. 267285. (2003) IOS Press, NATO Science Series III: Computer and Systems Sciences. [paper]

J) Optimization, Difficulty: 5 (Betreuer: Marc Deisenroth)

J. E. Dennis Jr and J. J. Mor�: Quasi-Newton Methods: Motivation and Theory; SIAM Review, 19, 1, 46--89, 1977 [paper]
D. R. Jones, M. Schonlau and W. J. Welch: Efficient Global Optimization of Expensive Black-Box Functions; Journal of Global Optimization, 13, 4, 455-492 [paper]

K) Partially Observable Markov Decision Problems, Difficulty: 1.5 (Betreuer: Abdeslam Boularias)

Kaelbling, L.P., Littman, M.L., Cassandra, A.R. (1998). "Planning and acting in partially observable stochastic domains". Artificial Intelligence Journal 101: 99�134. [paper]

L) Planning with Multiple Agents, Difficulty: 4.5 (Betreuer: Abdeslam Boularias)

Sven Seuken and Shlomo Zilberstein (2008). Formal Models and Algorithms for Decentralized Decision Making Under Uncertainty. In Journal of Autonomous Agents and Multi-Agent Systems, 17:2, pp. 190-250 [paper]

M) Exploration-Exploitation Trade-Off in Bandits, Difficulty: 5 (Betreuer: Jan Peters)

P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem; Machine learning, 47, 2, pp.235�256, 2002 [paper]

O) Learning in adversarial systems and games, Difficulty: 4.5 (Betreuer: Jan Peters)

P. Auer, N. Cesa-Bianchi, Y. Freund, R. E. Schapire; Gambling in a rigged casino: The adversarial multi-armed bandit problem; Foundations of Computer Science, 1995 [paper]
L. Kocsis, L., C. Szepesvari, 2006. Bandit based Monte-Carlo planning; Proceedings of the 15th European Conference on Machine Learning, 282293 [paper]

P) Learning with Options, Difficulty: 4 (Betreuer: Gerhard Neumann)

Richard Sutton, Doina Precup, and Satinder Singh (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, Vol 112, pp 181-211. [paper]
G.D. Konidaris, S.R. Kuindersma, A.G. Barto and R.A. Grupen (2010). Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories. Advances in Neural Information Processing Systems 23, pp 1162-1170.[paper]

Q) Reinforcement Learning in Games, Difficulty: 2.5 (Betreuer: Gerhard Neumann)

Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3 [paper]
Junling Hu, Michael P. Wellman (2003). Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research, Vol 4, pp 1039-1069. [paper]

R) Games with Incomplete Information, Difficulty: 5 (Betreuer: Abdeslam Boularias)

Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione (2007). Regret Minimization in Games with Incomplete Information. Advances in Neural Information Processing Systems 20. [paper]
Joel Veness, David Silver, William Uther, Alan Blair (2009). Bootstrapping from Game Tree Search. Advances in Neural Information Processing Systems 22. [paper]

S) Autonomous Chess-Playing, Difficulty: 4 (Betreuer: Marc Deisenroth)

Matuszek et al.: Gambit: A Robust Chess-Playing Robotic System, ICRA 2011. [paper]

T) Learning control, Difficulty: 1 (Betreuer: Duy Nguyen-Tuong)

S. Schaal and C. G. Atkeson ,Learning Control in Robotics, IEEE Robotics & Automation Magazine, 17, 20-29, 2010 [paper]

U) Policy Search with Forward Models, Difficulty: 3.5 (Betreuer: Jan Peters)

M. P. Deisenroth, C. E. Rasmussen; PILCO: A Model-Based and Data-Efficient Approach to Policy Search; Proceedings of the 28th International Conference on Machine Learning, 2011 [paper]

V) Applications of Autonomous Learning Systems in Software Agents, Difficulty: 4 (Betreuer: Marc Deisenroth)

R. Herbrich, T. Minka, T. Graepel; TrueSkillTM: A Bayesian skill rating system; Advances in Neural Information Processing Systems, 20, 569�576, 2007 [paper]

W) Success stories of Learning Methods, Difficulty: 2 (Betreuer: Duy Nguyen-Tuong)

David Ferrucci et al. (2010). Building Watson: An Overview of the DeepQA Project. AI Magazine Fall, 2010. [paper]

X) Reinforcement Learning with Inaccurate Models, Difficulty: 4 (Betreuer: Marc Deisenroth)

Schneider: Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning (1997)
Bagnell and Schneider: Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods (2001)
Abbeel et al.: Using inaccurate models in reinforcement learning (2006)
Deisenroth et al.: Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning (2011)

Y) Learning and Inference with Graphical Models, Difficulty: 2 (Betreuer: Abdeslam Boularias)

Philipp Kr�henb�hl and Vladlen Koltun: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials (2011)[paper]
John Lafferty, Andrew McCallum and Fernando Pereira: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (2001)[paper]
Charles Sutton and Andrew McCallum: An Introduction to Conditional Random Fields for Relational Learning (2006)[paper]

Z) Experience and Conciousness, Difficulty: 1 (Betreuer: Jan Peters)

Benjamin Kuipers. 2008. Drinking from the firehose of experience. Artificial Intelligence in Medicine 44: 155-170, 2008. [papers]

ZZ) Movie/Image Reconstruction from Brain Imaging by Machine Learning (Betreuer: Jan Peters)

Reconstructing visual experiences from brain activity evoked by natural movies (Current Biology 2011). This paper presents the first successful approach for reconstructing natural movies from brain activity.
Encoding and decoding in fMRI (Neuroimage 2010). This paper reviews the current state of "brain decoding" research, and advocates one particularly powerful approach.
Bayesian reconstruction of natural images from human brain activity (Neuron 2009). This paper presents the first successful approach for reconstructing natural images from brain activity.