Topics for the Autonomous Learning Systems Seminar
Die folgenden Themen werden von unterschiedlichen Betreuern angeboten. Beim Vorgespraech werden die Themen gemeinsam vergeben. Themen der eigenen Wahl - soweit im Rahmen des Seminars - sind zusaetzlich erlaubt.
DEADLINE FOR TOPIC SELECTION: Friday, November 2, 2012
A) Temporal Difference Learning, Difficulty: 3 (Betreuer: Gerhard Neumann)
- R. S. Sutton; Learning to predict by the methods of temporal differences; Machine learning, 1988 [paper]
- M. G. Lagoudakis & R. Parr; Least-Squares Policy Iteration; Journal of Machine Learning Research, 2003 [paper]
B) Q-Learning, Difficulty: 2 (Betreuer: Jan Peters)
- C. Watkins, P. Dayan; Q-learning, Machine Learning, 8, 279-292. [paper]
- S. Singh, T. Jaakkola, M. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287, 2000. [paper]
C) Batch-Mode Reinforcement Learning, Difficulty 2 (Betreuer: Gerhard Neumann)
- D. Ernst and P. Geurts and L. Wehenkel and L. Littman, Tree-based batch mode reinforcement learning, Journal of Machine Learning Research, 2005, [paper]
- M. Riedmiller, T. Gabel, R. Hafner, S. Lange, Reinforcement learning for robot soccer, Autonomous Robots, 2011 [paper]
D) Policy Gradient Methods, Difficulty: 4 (Betreuer: Jan Peters)
- S. Kakade. A Natural Policy Gradient. In Advances in Neural Information Processing Systems 14 2002. [paper]
- Sutton, R.S., McAllester, D., Singh, S., Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems 12 (Proceedings of the 1999 conference), pp. 1057-1063. MIT Press. [paper]
E) Human Motion Analysis and Recognition, Difficulty: 3 (Betreuer: Heni Ben Amor)
- R. Poppe; A survey on vision-based human action recognition. Image and Vision Computing, Volume 28, Issue 6, Pages 976-990, 2010. paper
- I. Akhter, T. Simon, S. Khan, I. Matthews, and Y. Sheikh, Bilinear Spatiotemporal Basis Models, ACM Transactions on Graphics, 2012. website
F) Expectation-Maximization-Based Policy Search, Difficulty 4 (Betreuer: Gerhard Neumann)
- J. Kober and J. Peters, Policy search for motor primitives in robotics, Machine Learning, 2011, [paper]
- P. Dayan and G. Hinton, Using expectation-maximization for reinforcement learning, Neural Computation, 1997, [paper]
G) Inverse Reinforcement Learning with Linearly-Solvable MDPs, Difficulty: 3.5 (Betreuer: Jan)
- K. Dvijotham, E. Todorov; Inverse Optimal Control with Linearly-Solvable MDPs; International Conference on Machine Learning, 2010 [paper]
H) Planning for Relational Rules, Difficulty: 3.5 (Betreuer: Abdeslam Boularias)
- Tobias Lang, Marc Toussaint (2010). Planning with Noisy Probabilistic Relational Rules. Journal of Artificial Intelligence Research, 39, pp 1-49. [paper]
I) Probabilistic inference for autonomous learning, Difficulty: 2 (Betreuer: Duy Nguyen-Tuong)
- D. J. C. MacKay; Probable Networks and plausible predictions – a review of practical Bayesian methods for supervised neural networks; Network: Computation in Neural Systems, 6, 469-505 [paper]
- C. M. Bishop, M. E. Tipping, Bayesian regression and classification; Advances in Learning Theory: Methods, Models and Applications, Volume 190, pp. 267285. (2003) IOS Press, NATO Science Series III: Computer and Systems Sciences. [paper]
J) Optimization, Difficulty: 5 (Betreuer: Marc Deisenroth)
- J. E. Dennis Jr and J. J. Moré: Quasi-Newton Methods: Motivation and Theory; SIAM Review, 19, 1, 46--89, 1977 [paper]
- D. R. Jones, M. Schonlau and W. J. Welch: Efficient Global Optimization of Expensive Black-Box Functions; Journal of Global Optimization, 13, 4, 455-492 [paper]
K) Partially Observable Markov Decision Problems, Difficulty: 1.5 (Betreuer: Abdeslam Boularias)
- Kaelbling, L.P., Littman, M.L., Cassandra, A.R. (1998). "Planning and acting in partially observable stochastic domains". Artificial Intelligence Journal 101: 99–134. [paper]
L) Planning with Multiple Agents, Difficulty: 4.5 (Betreuer: Abdeslam Boularias)
- Sven Seuken and Shlomo Zilberstein (2008). Formal Models and Algorithms for Decentralized Decision Making Under Uncertainty. In Journal of Autonomous Agents and Multi-Agent Systems, 17:2, pp. 190-250 [paper]
M) Exploration-Exploitation Trade-Off in Bandits, Difficulty: 5 (Betreuer: Jan Peters)
- P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem; Machine learning, 47, 2, pp.235–256, 2002 [paper]
O) Learning in adversarial systems and games, Difficulty: 4.5 (Betreuer: Jan Peters)
- P. Auer, N. Cesa-Bianchi, Y. Freund, R. E. Schapire; Gambling in a rigged casino: The adversarial multi-armed bandit problem; Foundations of Computer Science, 1995 [paper]
- L. Kocsis, L., C. Szepesvari, 2006. Bandit based Monte-Carlo planning; Proceedings of the 15th European Conference on Machine Learning, 282293 [paper]
P) Learning with Options, Difficulty: 4 (Betreuer: Gerhard Neumann)
- Richard Sutton, Doina Precup, and Satinder Singh (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, Vol 112, pp 181-211. [paper]
- G.D. Konidaris, S.R. Kuindersma, A.G. Barto and R.A. Grupen (2010). Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories. Advances in Neural Information Processing Systems 23, pp 1162-1170.[paper]
Q) Reinforcement Learning in Games, Difficulty: 2.5 (Betreuer: Gerhard Neumann)
- Gerald Tesauro (1995). Temporal Difference Learning and TD-Gammon. Communications of the ACM, Vol. 38, No. 3 [paper]
- Junling Hu, Michael P. Wellman (2003). Nash Q-Learning for General-Sum Stochastic Games. Journal of Machine Learning Research, Vol 4, pp 1039-1069. [paper]
R) Games with Incomplete Information, Difficulty: 5 (Betreuer: Abdeslam Boularias)
- Martin Zinkevich, Michael Johanson, Michael Bowling, Carmelo Piccione (2007). Regret Minimization in Games with Incomplete Information. Advances in Neural Information Processing Systems 20. [paper]
- Joel Veness, David Silver, William Uther, Alan Blair (2009). Bootstrapping from Game Tree Search. Advances in Neural Information Processing Systems 22. [paper]
S) Autonomous Chess-Playing, Difficulty: 4 (Betreuer: Marc Deisenroth)
- Matuszek et al.: Gambit: A Robust Chess-Playing Robotic System, ICRA 2011. [paper]
T) Learning control, Difficulty: 1 (Betreuer: Duy Nguyen-Tuong)
- S. Schaal and C. G. Atkeson ,Learning Control in Robotics, IEEE Robotics & Automation Magazine, 17, 20-29, 2010 [paper]
U) Policy Search with Forward Models, Difficulty: 3.5 (Betreuer: Jan Peters)
- M. P. Deisenroth, C. E. Rasmussen; PILCO: A Model-Based and Data-Efficient Approach to Policy Search; Proceedings of the 28th International Conference on Machine Learning, 2011 [paper]
V) Applications of Autonomous Learning Systems in Software Agents, Difficulty: 4 (Betreuer: Marc Deisenroth)
- R. Herbrich, T. Minka, T. Graepel; TrueSkillTM: A Bayesian skill rating system; Advances in Neural Information Processing Systems, 20, 569–576, 2007 [paper]
W) Success stories of Learning Methods, Difficulty: 2 (Betreuer: Duy Nguyen-Tuong)
- David Ferrucci et al. (2010). Building Watson: An Overview of the DeepQA Project. AI Magazine Fall, 2010. [paper]
X) Reinforcement Learning with Inaccurate Models, Difficulty: 4 (Betreuer: Marc Deisenroth)
- Schneider: Exploiting Model Uncertainty Estimates for Safe Dynamic Control Learning (1997)
- Bagnell and Schneider: Autonomous Helicopter Control using Reinforcement Learning Policy Search Methods (2001)
- Abbeel et al.: Using inaccurate models in reinforcement learning (2006)
- Deisenroth et al.: Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning (2011)
Y) Learning and Inference with Graphical Models, Difficulty: 2 (Betreuer: Abdeslam Boularias)
- Philipp Krähenbühl and Vladlen Koltun: Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials (2011)[paper]
- John Lafferty, Andrew McCallum and Fernando Pereira: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data (2001)[paper]
- Charles Sutton and Andrew McCallum: An Introduction to Conditional Random Fields for Relational Learning (2006)[paper]
Z) Experience and Conciousness, Difficulty: 1 (Betreuer: Jan Peters)
- Benjamin Kuipers. 2008. Drinking from the firehose of experience. Artificial Intelligence in Medicine 44: 155-170, 2008. [papers]
ZZ) Movie/Image Reconstruction from Brain Imaging by Machine Learning (Betreuer: Jan Peters)
- Reconstructing visual experiences from brain activity evoked by natural movies (Current Biology 2011). This paper presents the first successful approach for reconstructing natural movies from brain activity.
- Encoding and decoding in fMRI (Neuroimage 2010). This paper reviews the current state of "brain decoding" research, and advocates one particularly powerful approach.
- Bayesian reconstruction of natural images from human brain activity (Neuron 2009). This paper presents the first successful approach for reconstructing natural images from brain activity.