We offer these current topics directly for Bachelor and Master students at TU Darmstadt who can feel free to DIRECTLY contact the thesis advisor if you are interested in one of these topics. **Excellent external students from another university may be accepted but please first email Jan Peters. Note that we cannot provide funding for any of these theses projects.**

We highly recommend that you do either our robotics and machine learning lectures (Robot Learning, Statistical Machine Learning) or our colleagues (Grundlagen der Robotik, Probabilistic Graphical Models and/or Deep Learning). Even more important to us is that you take both Robot Learning: Integrated Project, Part 1 (Literature Review and Simulation Studies) and Part 2 (Evaluation and Submission to a Conference) before doing a thesis with us.

In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please **DIRECTLY** contact the thesis advisor if you are interested in one of these topics. When you contact the advisor, it would be nice if you could mention (1) **WHY** you are interested in the topic (dreams, parts of the problem, etc), and (2) **WHAT** makes you special for the projects (e.g., class work, project experience, special programming or math skills, prior work, etc.). Supplementary materials (CV, grades, etc) are highly appreciated. Of course, such materials are not mandatory but they help the advisor to see whether the topic is too easy, just about right or too hard for you.

**Only contact *ONE* potential advisor at the same time! If you contact more a second one without first concluding discussions with the first advisor (i.e., decide for or against the thesis with her or him), we may not consider you at all. Only if you are super excited for at most two topics send an email to both supervisors, so that the supervisors are aware of the additional interest. **

**FOR FB16+FB18 STUDENTS: Students from other depts at TU Darmstadt (e.g., ME, EE, IST), you need an additional formal supervisor who officially issues the topic. Please do not try to arrange your home dept advisor by yourself but let the supervising IAS member get in touch with that person instead. Multiple professors from other depts have complained that they were asked to co-supervise before getting contacted by our advising lab member. **

**Scope:** Master Thesis **Advisor:** Dorothea Koert, Joni Pajarinen**Added:** 2021-06-08 **Start:** ASAP

The ability to model the beliefs and goals of a partner is an essential part of cooperative tasks. While humans develop theory of mind models for this aim already at a very early age [1] it is still an open question how to implement and make use of such models for cooperative robots [2,3,4]. In particular, in shared workspaces human robot collaboration could potentially profit from the use of such models e.g. if the robot can detect and react to planned human goals or a human's false beliefs during task execution. To make such robots a reality, the goal of this thesis is to investigate the use of first and second order mental models in a cooperative manipulation task under partial observability. Partially observable Markov decision processes (POMDPs) and interactive POMDPs (I-POMDPs) [5] define an optimal solution to the mental modeling task and may provide a solid theoretical basis for modelling. The thesis may also compare related approaches from the literature and setup an experimental design for evaluation with the bi-manual robot platform Kobo.

Highly motivated students can apply by sending an e-mail expressing your interest to attaching your CV and transcripts. dorothea.koert@tu-darmstadt.de

References:

- Wimmer, H., & Perner, J. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception (1983)
- Sandra Devin and Rachid Alami. An implemented theory of mind to improve human-robot shared plans execution (2016)
- Neil Rabinowitz, Frank Perbet, Francis Song, Chiyuan Zhang, SM Ali Eslami,and Matthew Botvinick. Machine theory of mind (2018)
- Connor Brooks and Daniel Szafir. Building second-order mental models for human-robot interaction. (2019)
- Prashant Doshi, Xia Qu, Adam Goodie, and Diana Young. Modeling recursive reasoning by humans using empirically informed interactive pomdps. (2010)

**Scope:** Bachelor/Master Thesis **Advisor:** Vignesh Prasad**Added:** 2021-06-08 **Start:** ASAP **Topic:**

Handshaking as a synchronous haptic interaction actively involves sensing the pressure exerted by the partner and accordingly adjusting one’s own force. This is an important aspect when it comes to robotic handshaking since touch can convey complex emotions. It is, therefore, important for a robot to be able to sense the partner during handshaking and respond adequately in a synchronous manner. One such approach is presented by Vigni et al. [1] who estimate the human’s force using sensors on a robot hand, that then adjusts itself to give a comfortable handshake.

This can also be used to yield a perceived personality to robots from handshakes, for example, stronger grip strength is perceived as more confident/extroverted [1,2] and can show higher arousal and dominance while a weaker one would show greater pleasantness/valence [3,4]. In [3,4], it is additionally shown that a strong grip handshake can increase the effect of visual emotions.

The goal of this thesis would be to use the Festo BionicSoftHand to explore the following:

- Extending [1] to follow a more principled approach for the interaction modelling for handshaking (for example by using force-impedance models).
- Performing haptic emotion detection based on the measured force exerted on the robot hand.
- Incorporating Emotion Contagion Models [5] for such a haptic interaction.

Highly motivated students can apply by sending an e-mail expressing your interest to attaching your CV and transcripts. vignesh.prasad@tu-darmstadt.de

References:

- F. Vigni, E. Knoop, D. Prattichizzo, and M. Malvezzi, “The role of closed-loop hand control in handshaking interactions,” IEEE Robotics and Automation Letters, 2019.
- P.-H. Orefice, M. Ammi, M. Hafez, and A. Tapus, “Let’s handshake and i’ll know who you are: Gender and personality discrimination in human-human and human-robot handshaking interaction,” in IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS), 2016.
- M. Ammi, V. Demulier, S. Caillou, Y. Gaffary, Y. Tsalamlal, J.-C. Martin, and A. Tapus, “Haptic human-robot affective interaction in a handshaking social protocol,” in ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2015.
- M. Y. Tsalamlal, J.-C. Martin, M. Ammi, A. Tapus, and M.-A. Amorim, “Affective handshake with a humanoid robot: How do participants perceive and combine its facial and haptic express sions?” In IEEE International Conference on Affective Computing and Intelligent Interaction (ACII), 2015.
- E. Hatfield, J. T. Cacioppo, and R. L. Rapson, “Emotional contagion,” Current directions in psychological science, vol. 2, no. 3, 1993.

**Scope:** Master Thesis **Advisor:** Georgia Chalvatzaki, Despoina Paschalidou**Added:** 2021-05-19 **Start:** ASAP **Topic:**

In this thesis, we will investigate the use of 3D primitive representations in objects using Invertible Neural Networks (INNs). Through INNs we can learn the implicit surface function of the objects and their mesh. Apart from extracting the object’s shape, we can parse the object into semantically interpretable parts. In our work our main focus will be to segment the parts in objects that are semantically related to object affordances. Moreover, the implicit representation of the primitive can allow us to compute directly the grasp configuration of the object, allowing grasp planning. Interested students are expected to have experience with Computer Vision and Deep Learning, but also know how to program in Python using DL libraries like PyTorch.

The thesis will be co-supervised by Despoina Paschalidou (Ph.D. candidate at the Max Planck Institute for Intelligent Systems and the Max Planck ETH Center for Learning Systems). Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your CV and transcripts. georgia.chalvatzaki@tu-darmstadt.de

References:

- Paschalidou, Despoina, Angelos Katharopoulos, Andreas Geiger, and Sanja Fidler. "Neural Parts: Learning expressive 3D shape abstractions with invertible neural networks." arXiv preprint arXiv:2103.10429 (2021).
- Karunratanakul, Korrawe, Jinlong Yang, Yan Zhang, Michael Black, Krikamol Muandet, and Siyu Tang. "Grasping Field: Learning Implicit Representations for Human Grasps." arXiv preprint arXiv:2008.04451 (2020).
- Chao, Yu-Wei, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang et al. "DexYCB: A Benchmark for Capturing Hand Grasping of Objects." arXiv preprint arXiv:2104.04631 (2021).
- Do, Thanh-Toan, Anh Nguyen, and Ian Reid. "Affordancenet: An end-to-end deep learning approach for object affordance detection." In 2018 IEEE international conference on robotics and automation (ICRA), pp. 5882-5889. IEEE, 2018.

**Scope:** Master Thesis **Advisor:** Julen Urain De Jesus, Puze Liu, Georgia Chalvatzaki **Added:** 2021-04-23 **Start:** ASAP **Topic:**

In robotics, we deal with the problem of solving complex task planning problems in highly unstructured environments. While, in the last years, end-to-end learning algorithms have been proposed to solve these problems, the lack of clear abstractions to define policies seems a bottleneck for generalization of the learned skills. In this project. We consider that a proper understanding of the objects with which the workspace is composed could help the robot obtain better generalization properties.

This project deals with the problem of predicting the properties of articulated objects. Given a RGB+D image of an scene, a robotics oriented perception system should be able to extract relevant object-centric features, such as axis of rotation, handle position, objects position or size.

The student is expected to train Deep Learning models that given a big supervised dataset of scenes, he/she should train to model to predict (1) Where the relevant objects are in the scene and (2) Which are the features of these objects. The master thesis is oriented to students with high coding skills and strong knowledge working with Pytorch. Additionally, the student should have interested on Computer Vision and working with networks that deals with RGB+D data, such as CNN or PointNet.

Highly motivated students can apply by sending an e-mail expressing your interest to , urain@ias.informatik.tu-darmstadt.de or liu@ias.informatik.tu-darmstadt.de, attaching your CV and transcripts. georgia.chalvatzaki@tu-darmstadt.de

References:

- Jain, Ajinkya and Lioutikov, Rudolf and Chuck, Caleb and Niekum, Scott. "ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory" (2021).
- Li, Xiaolong, et al. "Category-level articulated object pose estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020).
- Mo, Kaichun and Guibas, Leonidas and Mukadam, Mustafa and Gupta, Abhinav and Tulsiani, Shubham. "Where2Act: From Pixels to Actions for Articulated 3D Objects" (2021)

**Scope:** Master Thesis **Advisor:** Georgia Chalvatzaki, Daniel Leidner **Added:** 2021-04-22 **Start:** ASAP **Topic:**

Grasp planning is one of the most challenging tasks in robot manipulation. Apart from perception ambiguity, the grasp robustness and the successful execution rely heavily on the dynamics of the robotic hands. The student is expected to research and develop benchmarking environments and evaluation metrics for grasp planning. The development in simulation environments as ISAAC Sim and Gazebo will allow us to integrate and evaluate different robotic hands for grasping a variety of everyday objects. We will evaluate grasp performance using different metrics (e.g., object-category-wise, affordance-wise, etc.), and finally, test the sim2real gap when transferring such approaches from popular simulators to real robots. The student will have the chance to work with different robotic hands (Justin hand, PAL TIAGo hands, Robotiq gripper, Panda gripper, etc.) and is expected to transfer the results to at least two robots (Rollin’ Justin at DLR and TIAGo++ at TU Darmstadt). The results of this thesis are intended to be made public (both the data and the benchmarking framework) for the benefit of the robotics community. As this thesis is offered in collaboration with the DLR institute of Robotics and Mechatronics in Oberpfaffenhofen near Munich, the student is expected to work in DLR for a period of 8-months for the thesis. On-site work at the premises of DLR can be expected but not guaranteed due to COVID-19 restrictions. A large part of the project can be carried out remotely.

Highly motivated students can apply by sending an e-mail expressing your interest to and daniel.leidner@dlr.de, attaching your CV and transcripts. georgia.chalvatzaki@tu-darmstadt.de

References:

- Collins, Jack, Shelvin Chand, Anthony Vanderkop, and David Howard. "A Review of Physics Simulators for Robotic Applications." IEEE Access (2021).
- Bekiroglu, Y., Marturi, N., Roa, M. A., Adjigble, K. J. M., Pardi, T., Grimm, C., ... & Stolkin, R. (2019). Benchmarking protocol for grasp planning algorithms. IEEE Robotics and Automation Letters, 5(2), 315-322.

**Scope:** Master Thesis **Advisor:** Tianyu Ren, Georgia Chalvatzaki **Added:** 2021-04-14 **Start:** ASAP **Topic:**
Research and implementation of a SOTA task and motion planner (TAMP) that combines two types of AI reasoning: symbolic plan and metric decision, to enable general-purpose robot manipulation in daily activities.
TAMP is an ideal policy representation of long-horizon robot skills where most advanced ML algorithms may apply to. In addition, it is practically useful in robotic and AI industries.

Requirement: Good knowledge of Python, C++; Basic knowledge of robot kinematics **References**

- Dantam, Neil T., Swarat Chaudhuri, and Lydia E. Kavraki. "The task-motion kit: An open source, general-purpose task and motion-planning framework." IEEE Robotics & Automation Magazine 25.3 (2018): 61-70.
- TMKit: A task-motion planning framework. [Online]. Available: http://tmkit.kavrakilab.org

**Scope:** Master's thesis **Advisor:** Tuan Dam, Carlo D'Eramo, Joni Pajarinen **Start:** ASAP **Topic:** Applying reinforcement to autonomous driving is a promising but challenging research direction due to the high uncertainty and environmental conditions in the task. Efficient reinforcement learning is needed. For efficient reinforcement learning recent work has suggested solving the Bellman Optimality equation with Stability guarantees but unfortunately no guarantee for zero bias has been proposed in this context making reinforcement learning susceptible to getting stuck in dangerous solutions. In this work we formulate the Bellman equation into a Convex-Concave Saddle Point Problem and solve it using a new proposed Accelerated Primal-Dual Algorithm [3]. We will test the algorithm in benchmark problems and in an autonomous driving task such as the one shown on the right (video: https://youtu.be/Hp8Dz-Zek2E) where an efficient unbiased solution is needed.

[1] Ofir Nachum, Yinlam Chow, and Mohammad Ghavamzadeh. Path consistency learning in Tsallis entropy regularized mdps. arXiv preprint arXiv:1802.03501 , 2018.

[2] Dai, Bo, et al. "Sbeed: Convergent reinforcement learning with nonlinear function approximation." International Conference on Machine Learning. PMLR, 2018.

[3] Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm for general convex-concave saddle point problems. arXiv preprint arXiv:1803.01401 , 2018

**Scope:** Master's thesis **Advisor:** Michael Lutter, Joni Pajarinen **Start:** ASAP

**Topic:** Imagine a future, where a user teaches a robot how to combine parts into objects or structures. Now imagine, that the robot is able to go even further: build objects with desired object properties, for example height, stability, or shape, using object parts which it has not seen before. In this thesis, we use reinforcement learning and monte carlo tree search to train a robot to build novel objects from novel object parts based on a database of previously demonstrated object part assemblies. Object parts and objects will be modeled as graphs where each graph node specifies to which kinds of other graph nodes it can be connected to. Putting two object parts together results then in a bigger graph merged from the two object part graphs. Experiments will be performed mainly in simulation, but, if desired, the approach can be also evaluated on a real robot. Suitable background knowledge for this thesis can be gained for example in robot learning or reinforcement learning lectures.

**Scope:** Master's thesis **Advisor:** Tuan Dam, Joni Pajarinen **Start:** ASAP

**Topic:** Google Deepmind recently showed how Monte Carlo Tree Search (MCTS) combined with neural networks can be used to play Go on a super-human level. However, one disadvantage of MCTS is that the search tree explodes exponentially with respect to the planning horizon. In this Master thesis the student will integrate the advantages of MCTS, that is, optimistic decision making into a policy representation that is limited in size with respect to the planning horizon. The outcome will be an approach that can plan further into the future. The application domain will include partially observable problems where decisions can have far reaching consequences.

**Scope:** Master's thesis **Advisor:** Joe Watson**Start:** ASAP **Topic:**

Recent work has presented a control-as-inference formulation that frames optimal control as input estimation. The linear Gaussian assumption can be shown to be equivalent to the LQR solution, while approximate inference through linearization can be viewed as a Gauss–Newton method, similar to popular trajectory optimization methods (e.g. iLQR). However, the linearization approximation limits both the tolerable environment stochasticity and exploration during inference.

The aim of this thesis is to use alternative approximate inference methods (e.g. quadrature, monte carlo, variational), and investigate the benefits to stochastic optimal control and trajectory optimization. Ideally, prospective students are interested in optimal control, approximate inference methods and model-based reinforcement learning.

**Scope:** Master's thesis **Advisor:** Julen Urain De Jesus, Puze Liu **Start:** ASAP **Topic:**

Many Robotics tasks are multimodal. This is the case for example of grasping, on which the robot can grasp an object with several configurations. Anyway, most of the episodic RL problems are limited to gaussian distributions.

In this project, we want to learn through Deep Reinforcement Learning, complex distributions for our policies and solve some difficult multi-modal problems. Even if we are going to start exploring this problem in simulation, we expect for the end of the thesis to be able to adapt the algorithms to real robots.

**Scope:** Master's thesis, Bachelor's thesis **Advisor:** Michael Lutter **Start:** Anytime Soon **Topic:** One way to achieve reinforcement learning using few samples is model-based reinforcement learning but historically these approaches lack the comparable asymptotic performance as model-free approaches. Only very recently two papers showed comparable asymptotic performance with lower sample complexity using probabilistic models composed of network ensembles.

Within this thesis you should develop a probabilistic version of Deep Lagrangian Networks ( Lutter et. al., ICLR 2019), a physics derived architecture that only allows physically plausible models, and use this probabilistic representation for model-based exploration and policy improvement. For the probabilistic version you should use the deterministic and robust bayesian network approach presented earlier this year ( Wu et. al., ICLR 2019).

Finally, you should demonstrate your sample-efficient approach on the physical Cartpole & Furuta Pendulum and learn the swing up only using the physical system and, publish a paper about it :D. So if your are excited to try out Bayesian Deep Learning and want to get your hands dirty with model-based RL, this thesis is perfect for you. So if you are interested just message me (michael@robot-learning.de) and I am happy to discuss more details.

**TL;TR:**

- Extend DeLaN to bayesian deep learning
- Use this probabilistic model for efficient exploration and policy improvement
- Impress everybody by learning the swing-up only using the physical cartpole
- Publish your thesis at a machine learning conference
- Good knowledge of machine learning & deep learning required
- Good programming skills in Python required

**Scope:** Master's or Bachelor's thesis **Advisor:** Riad Akrour, Oleg Arenz **Start:** ASAP **Topic:** Correlated exploration is any exploration mechanism that enforces correlation of the action noise with respect to time or states. Correlated exploration is important for robotics in order to reduce or eliminate jerkiness of exploration and maintain the physical integrity of the robot. Correlated exploration was studied on low dimensional policy representations [1, 2], and we demonstrated suitability of such a learning scheme, for specialized policies, directly on a robotics platform [3]. It has also been shown that correlated exploration can be applied to larger, neural network based, policies [4]. However, the exploration scheme of [4], if seen as an episodic contextual policy search algorithm, is rather primitive in its adaption of the exploration noise, and does not offer the necessary guarantees to be applied directly on a robot. In this thesis, we propose to leverage our expertise in entropy regularized policy search algorithms [5, 6] to improve over these shortcomings in order to provide a safe and efficient correlated exploration algorithm for robotics. The successful candidate is expected to investigate the following topics:

- Set-up baseline by integrating correlated exploration of [4] to recent versions of DDPG such as [7].
- Compare uncorrelated and correlated exploration on simulated tasks and on the Quanser robots.
- Improve over existing correlated exploration formulations by, for example, integrating the gradient update of DDPG to our well founded formulations of entropy regularized episodic policy search algorithms [5, 6].

The successful candidate is expected to conduct their thesis with scientific rigor and a drive for quality such that their work find its place at a top machine learning or robotics conference.

[1] Rückstieß, T. et al.; State-dependent exploration for policy gradient methods; ECML 2008.

[2] van Hoof, H. et al.; Generalized exploration in policy search.; MLJ 2017.

[3] Parisi, S. et al.; Reinforcement learning vs human programming in tetherball robot games; IROS 2015.

[4] Plappert, M. et al.; Parameter space noise for exploration; ICLR 2018.

[5] Akrour, R. et al.; Model-free trajectory-based policy optimization with monotonic improvement; JMLR 2018.

[6] Arenz, O. et al.; Efficient gradient-free variational inference using policy search; ICML 2018.

[7] Fujimoto, S. et al.; Addressing function approximation error in actor-critic methods; ICML 2018.

**Scope:** Bachelor's thesis/Project **Advisor:** Davide Tateo **Start:** ASAP **Topic:**
Hierarchical Reinforcement Learning (HRL) is the field of Reinforcement Learning (RL) that considers structured agents. In this field, a high-level task is decomposed in simpler subtasks. The resulting control policy is represented as a hierarchy of policy, where each policy solves a subtask. While the original literature of HRL focus on how is possible to exploit domain knowledge and structured exploration to speed-up the learning, the more recent approaches, based on Deep Learning, focus on using the hierarchical structure to solve tasks that cannot be solved, or that are difficult to learn, using classical Deep RL approaches. While classical HRL approaches are particularly well suited for finite state-action space MDPs, the more recent Deep HRL approaches can work in complex robotic tasks with continuous state and actions pairs.

One major drawback of the recent literature, is that the Deep HRL approaches shares one of the major issues of the "flat" Deep RL: indeed, the resulting policy is difficult to be interpreted by humans and thus cannot be trusted in safety-critical applications, as we cannot analyze and predict the global behavior. Another major drawback of Deep HRL algorithms is that it is difficult to insert prior knowledge of the environment in the policy structure, making even more difficult to apply these kinds of algorithms in real-world scenarios.

To solve these issues, we propose a novel HRL framework, inspired by control theory, where the design of the hierarchical agent is performed using block diagrams. This framework simplifies the design of hierarchical agents and proposes a different paradigm for HRL: we build structured agents that do not execute of a policy following the stack principle i.e., functions calls, but instead are composed by a set of different parallel controllers. More details about this framework can be found here.

The objective of this thesis is to simplify the design of hierarchical agents using the above-mentioned framework by implementing graphical tools to define easily the structure of the agent and analyze the behavior of the agent while interacting with the environment. Also, we need to improve the existing codebase by refactoring interfaces and implementing new features.

**Minimum knowledge**

- Good Python programming skills.

**Preferred knowledge**

- Knowledge of Python graphical and graph libraries;
- Basic knowledge of Reinforcement Learning;
- Knowledge of recent Deep RL methodologies;

**Accepted candidate will**

- Learn the basic of the proposed framework by looking at the existent codebase;
- Implement graphical tools to design and analyze Hierarchical Reinforcement Learning Agents;
- Refactor the currently existing framework to design HRL agents;
- Add new functionalities to the Hierarchical Reinforcement Learning Framework;
- Test the developed framework in toy problems or, optionally, on real robots;
- optionally, implement some standard Hierarchical Reinforcement Learning algorithms.

**Scope:** Master's thesis | Bachelor's thesis **Advisor:** Julen Urain **Start:** ASAP **Topic:**

Object Segmentation algorithms have proved that segmentating data with respect of the information they have is possible. This opens the door to considering time related data like trajectories or videos. Been able to segment the movements of the human with respect of the different actions they are doing will provide a powerful method to undetrstand human tasks, predict them and hopefully mimic it with a robot. In this project it is expected to study different algorithms for Unsupervised segmentation of human actions and study how well the learned models can predict human motion.

**Scope:** Master's thesis **Advisor:** Vincent Berenz (a collaborator at Tübingen at the at the Max Planck Institute for Intelligent Systems) **Start:** ASAP **Topic:**

Robotic scripted dance is common. One the other hand, interactive dance, in which the robot uses runtime sensory information to continuously adapt its moves to those of its (human) partner, remains challenging. It requires integration of together various sensors, action modalities and cognitive processes. The selected candidate objective will be to develop such an interactive dance, based on the software suit for simultaneous perception and motion generation our department built over the years. The target robot on which the dance will be applied is the wheeled robot Softbank Robotics Pepper. This master thesis is with the Max Planck Institute for Intelligent Systems and is located in Tuebingen. More information: https://am.is.tuebingen.mpg.de/jobs/master-thesis-interactive-dance-performed-by-sofbank-robotics-pepper

**Scope:** Master's Thesis, Bachelor's thesis **Advisor:** Tuan Dam, Pascal Klink **Start:** ASAP **Topic:**
Reinforcement Learning under partial observability of the true system state, albeit having great potential, is still an open problem. A critical ingredient for recent model-free RL approaches in partially observable domains is the right choice of a memory model that is limited to recurrent neural networks or full histories [1][2]. The goal of this project is to investigate and compare the performance of different models, including ones used in Computer Vision or Natural Language Processing (e.g. Recurrent Ladder Networks [3]), in partially observable domains to gain new insights. The student will compare the performance of the memory models in selected tasks in simulation. If desired, the student also has to chance to test a few of the memory models in a real robotic task of playing Mikado.

**Minimum knowledge**

- Good Python programming skills.

**Preferred knowledge**

- Knowledge of deep neural network, deep recurrent neural networks
- Basic knowledge of Reinforcement Learning, POMDP, Memory Representation in POMDP
- Knowledge of recent Deep RL methodologies;

[1] Deep recurrent q-learning for partially observable mdps, Hausknecht et al. https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/download/11673/11503

[2] Learning deep neural network policies with continuous memory states, Zhang et al.https://ieeexplore.ieee.org/iel7/7478842/7487087/07487174.pdf

[3] Recurrent Ladder Networks, Prémont-Schwarz et al. http://papers.nips.cc/paper/7182-recurrent-ladder-networks.pdf

**Scope:** Master's thesis, Bachelor thesis **Advisor:** Jan Peters **Start:** ASAP **Topic:** Inspired by results in neuroscience, especially in the Cerebellum, Kawato & Wolpert introduced the idea of the MOSAIC (modular selection and identification for control) learning architecture. In this architecture, local forward models, i.e., models that predict future states and events, are learned directly from observations. Based on the prediction accuracy of these models, corresponding inverse models can be learned. In this thesis, we want to focus on the problem of learning to control a robot system with a hysteresis in its friction.

**Scope:** Master's Thesis, Bachelor's thesis **Advisor:** Hany Abdulsamad **Start:** ASAP **Topic:** Model-based Reinforcement Learning is an approach to learn complex tasks given local approximations of the nonlinear dynamics of the environment and cost functions. It has proven to be a sample efficient approach for learning on real robots. Classical approaches for learning such local models have certain restrictions on the overall structure; for example the number of local componants and switching dynamics. State of the art research has recently moved to more general settings with nonparameteric approaches that require less structure. The aim of this thesis is to review the literature on this subject and to compare existing algorithms on real robots like the BioRob or the Barrett WAM.

**Scope:** Master's Thesis **Advisor:** Hany Abdulsamad **Start:** ASAP **Topic:** Standard learning control techniques focus on learning deterministic controllers. Even advanced policy search methods that rely on stochastic search distributions use stochastic controllers only for the purpose of exploration, the final policy is only applied in its deterministic form. There are however cases in which a deterministic controller is always sub-optimal, such as in scenarios with random unstructured disturbances. In this thesis we want to address the problem of learning true stochastic optimal controllers in the context of an adversarial setting, and investigate the question if adversarial learning can be used to generalize standard policy search methods. This topic includes very interesting and deep connections to robust control, game theory and multi-agent learning.

**Scope:** Master's Thesis **Advisor:** Joe Watson **Start:** Anytime **Topic:** Model-based Reinforcement Learning for robotics typically requires learning nonlinear stochastic dynamical systems. This project aims to combine the Koopman operator, bayesian machine learning and neural network models to represent these systems as linear gaussian dynamical systems in some high-dimensional embedding. See this write up for more details.

**Scope:** Master's Thesis, Bachelor's thesis **Advisor:** Hany Abdulsamad **Start:** ASAP **Topic:** A great challenge in applying Reinforcement Learning approaches is the need for human intervention to reset the scenario of a learned task, making the process very tedious and time consuming. A clear example is learning table tennis, where we are either limited to using a ball gun with predictable pattern of initial positions or a human is needed to play against the robot. However given a second robotic player, we propose a new setup, in which the two agents cooperate to develop two different strategies, where one agent learns to support the second in becoming a great table tennis player. It is interesting to see if in such a scenario the agents would be able to discover what might resemble a defensive and an aggressive strategy in table tennis. The thesis will concentrate on developing the concept of cooperation and testing the results in simulation and on our own real table tennis setup.

**Scope:** Master's Thesis **Advisor:** Hany Abdulsamad **Start:** ASAP **Topic:** Let's stop reinventing the wheel. Most recent approaches to model-based RL revolve around the concept of trajectory optimization (iLQG, GPS, PILCO <-- all related to DDP). They can all be categorized in terms of direct and indirect shooting methods, two categories of optimal control that have existed for a very long time. It is time to clarify this connection. The aim of this thesis is, first dive into the literature of control and model-based RL, second investigate the possibility of applying information-theoretic bounds to standard optimal control techniques.

**Scope:** Master's Thesis **Advisor:** Pascal Klink, Carlo D'Eramo **Start:** ASAP **Topic:** The idea of gradually learning to accomplish a complicated task via a guiding sequence of intermediate ones - referred to as Curriculum Learning - has shown great experimental success. The goal of this project is to investigate a recent take on Curriculum Learning in the domain of Reinforcement Learning, which interprets it as a form of Expectation Maximization. More precisely, the goal is to push the capabilities of this formulation by using advanced sampling methods to sample tasks for learning instead of simple approximations that have been used so far. The ideal candidate:

- is knowledged in Reinforcement Learning (as this is the basis for the project)
- has a basic understanding of Mixed-Integer Programming (or is not afraid of diving into this domain)
- has basic knowledge in the domain of Variational Inference

**Scope:** Master's thesis **Advisors:** Boris Belousov, Georgia Chalvatzaki, Bastian Wibranek **Start:** ASAP

**Topic:** Many real-world problems can be reduced to combinatorial optimization over graphs. For example, search for a combination of elements that produce a desired structure satisfying given design constraints, such as load bearing or form matching, is a ubiquitous problem in architecture. Commonly, heuristic search algorithms are employed which require expert input from the architect to guide the search. This thesis will investigate optimization approaches based on graph embedding techniques, such as graph neural networks, to improve the state of the art on combinatorial optimization in the architectural domain. The thesis will involve collaboration with the Digital Design Unit from FB Architektur.

**Scope:** Bachelor's / Master's Thesis **Advisor:** Joe Watson **Start:** Anytime **Topic:**
For learning in robotic manipulation, there are two cultures: 'end-to-end' vs inductive biases.
While the former is purely data-driven, the latter incorporates ideas from computer vision and visual servoing for more interpretable and sample-efficient performance.
This is an open-ended project aimed at investigating novel frameworks for perception-based manipulation that combine 'structure' (computer vision) with learning.

The ideal candidate:

- has knowledge of both robotics and (geometric) computer vision
- is interested in working on real robotic manipulators
- can write clean, maintainable software

**Scope:** Master's thesis **Advisor:** Joni Pajarinen **Start:** ASAP

**Topic:** Efficient exploration is one of the most prominent challenges in deep reinforcement learning. In reinforcement learning, exploration of the state space is critical for finding high value actions and connecting them to the causing actions. Exploration in model-free reinforcement learning has relied on classical techniques, empirical uncertainty estimates of the value function, or random policies. In model-based reinforcement learning value bounds have been used successfully to direct exploration. In this Master thesis project the student will investigate how lower and upper value bounds can be used to target exploration in model-free reinforcement learning into the most promising parts of the state space. This thesis topic requires background knowledge in reinforcement learning gained e.g. through machine learning or robot learning courses.

**Scope:** Master's Thesis **Advisor:** Joe Watson, Michael Lutter **Start:** Anytime **Topic:** Model-based Reinforcement Learning for robotics typically requires learning the nonlinear dynamics of complex multibody mechanical systems. The Recursive Newton Euler Algorithm (RNEA) is an existing means of efficiently modelling such systems. This project looks at using the Lie Algebra perspective of RNEA to implement the algorithm on a differential computation graph, the basis of deep learning models. This potentially offers the means of learning high-fidelity and interpretable models for robotics from data. See this write up for more details.