# Currently Offered Topics / Aktuelle angebotene Themen fuer Abschlussarbeiten

We offer these current topics directly for Bachelor and Master students at TU Darmstadt who can feel free to DIRECTLY contact the thesis advisor if you are interested in one of these topics. Excellent external students from another university may be accepted but are required to first email Jan Peters before contacting any other lab member for a thesis topic. Note that we cannot provide funding for any of these theses projects.

We highly recommend that you do either our robotics and machine learning lectures (Robot Learning, Statistical Machine Learning) or our colleagues (Grundlagen der Robotik, Probabilistic Graphical Models and/or Deep Learning). Even more important to us is that you take both Robot Learning: Integrated Project, Part 1 (Literature Review and Simulation Studies) and Part 2 (Evaluation and Submission to a Conference) before doing a thesis with us.

In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please DIRECTLY contact the thesis advisor if you are interested in one of these topics. When you contact the advisor, it would be nice if you could mention (1) WHY you are interested in the topic (dreams, parts of the problem, etc), and (2) WHAT makes you special for the projects (e.g., class work, project experience, special programming or math skills, prior work, etc.). Supplementary materials (CV, grades, etc) are highly appreciated. Of course, such materials are not mandatory but they help the advisor to see whether the topic is too easy, just about right or too hard for you.

Only contact *ONE* potential advisor at the same time! If you contact more a second one without first concluding discussions with the first advisor (i.e., decide for or against the thesis with her or him), we may not consider you at all. Only if you are super excited for at most two topics send an email to both supervisors, so that the supervisors are aware of the additional interest.

FOR FB16+FB18 STUDENTS: Students from other depts at TU Darmstadt (e.g., ME, EE, IST), you need an additional formal supervisor who officially issues the topic. Please do not try to arrange your home dept advisor by yourself but let the supervising IAS member get in touch with that person instead. Multiple professors from other depts have complained that they were asked to co-supervise before getting contacted by our advising lab member.

## Planning Strategies for Ubongo3D

Scope: Bachelor/Master thesis
Start: ASAP
Topic: Ubongo3D is a challenging game for humans and robots, as it involves game solving and fine manipulation. Humans are particularly good at manipulation, but perhaps a robot is better at game solving? Our goal in this thesis is to build an agent that solves the Ubongo3D game faster than a human does! There are many similarities between this problem and a bin packing task.

The goals of this thesis are:

• Develop a deterministic algorithm that works as a baseline (e.g., mixed-integer programming)
• Use stochastic planning methods to get faster results (e.g., MCTS)
• Learn a generalization of the task (e.g., using GNN)

If you are interested in this thesis, please send an e-mail with your CV and transcripts to

Requirements

• Python and recent ML/DL libraries, e.g. Pytorch
• Potentially C++
• Knowledge of Optimization

References

## RGB-D 6D Pose Estimation for Ubongo3D

Scope: Bachelor/Master thesis
Start: ASAP
Topic: Ubongo3D is a challenging game for humans and robots, as it involves game solving and fine manipulation. Our goal is make a robot play Ubongo3D based on visual input (RGB-D)! While computers are extremely fast at game solving (see recent successes in Go, Chess, StarCraft), humans are much better at manipulation of objects. One of the reasons is that they have a fine manipulation feedback controller, that among others, uses the object 6D poses as a part of the state. Therefore, having a reliable 6D pose estimation method is a crucial part in the control pipeline. We want to do vision-based pose estimation for a couple of reasons: these cameras are ubiquitous and easy to set-up; marker-based methods (such as round optitrack markers) cannot be used in the Ubongo pieces, since they would not fit into each other; and markers do not work well with occlusions.

The goals of this thesis are:

• Develop (and/or compare) reliable 6D pose estimation and tracking algorithms for the Ubongo3D task.
• Implement a baseline that, given the solution of the game, uses a simple feedback controller that uses your pose estimation algorithm.
• Make this work on the real-robot (Franka Emika Panda).

If you are interested in this thesis, please send an e-mail with your CV and transcripts to

Requirements

• Python and recent ML/DL libraries, e.g. Pytorch
• Some knowledge of Robotics
• Computer Vision

References

1. Ubongo3D, https://www.kosmos.de/spielware/spiele/familienspiele/7333/ubongo-3-d
2. Wen, B, et. al, se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains, IROS, 2020, https://github.com/wenbowen123/iros20-6d-pose-tracking

## Projected Bellman Operator

Scope: Master's thesis
Start: ASAP
Topic: The Bellman operator is the cornerstone of RL, finding applications from early value-based methods (e.g. value-iteration, Q-Learning) to more recent actor-critic approaches. By using collected data, the Bellman operator acts on the vector-space of state values, shifting from one point to another till the fixed point (e.g. the optimal value function) is reached. However, structurally the Bellman operator is unable to extrapolate from the observed data, thus its effectiveness is strongly influenced by the quality of exploration and hindered by overfitting. In this work, we will investigate how to find a parameterized model of the Bellman operator by finding an effective projection into a suitable functional space. A parameterized Bellman operator has the advantages of being independent of data and curbing overfitting; furthermore, it is promising for enabling few-shot learning in transfer RL. We will show the advantageous theoretical guarantees of convergence and sample-efficiency of our projected Bellman operator, and assess its benefit in several RL problems. The student will be mostly involved in the development of the code for experiments, first on toy problems, and later on more complex RL environments. Particularly motivated students, will be welcome to participate in the theoretical study of our projected Bellman operator.

Scope: Master's thesis, Bachelor's thesis
Start: ASAP
Topic: TLDR; Learning a mixture of mean-field Gaussians by combining VIPS with VADAM

ADAM is a popular method for minimizing a loss function in deep learning. Khan et al. [1] showed, that a slight modification of ADAM (called VADAM) can be applied to find the parameters of a Gaussian (with diagonal covariance), to approximate the posterior in Bayesian Inference (with NN loss functions). VIPS [2] is a method to optimize a Gaussian Mixture model for better, multimodal, posterior approximations, by enabling us to optimize the individual Gaussian components independently. However, VIPS learns full covariance matrices (using the MORE algorithm) and thus does not scale to very high-dimensional problems, e.g. neural network parameters. In this thesis, you will replace the MORE-optimizer within the VIPS framework by VADAM to efficiently learn mixture of mean-field Gaussians for high-dimensional, multi-modal variational inference. We will likely use our implementation of VIPS, which is written in Tensorflow 2.

Requirements
The topic is suitable for a Bachelor thesis as it should be relatively straightforward to implement (if you are familar with python/TF2). However, the topic also has a lot of potential to be useful for a wide range of audience, and we should aim to publish the result, which would require extra effort from you. To apply, first try to grasp the main insights and the mechanics of VADAM and VIPS ([1] and [2]) and arrange a meeting with me.

References
[1] Khan, Mohammad, et al. "Fast and scalable bayesian deep learning by weight-perturbation in adam." ICML 2018.
[2] Oleg Arenz, Mingjun Zhong, Gerhard Neumann. "Efficient Gradient-Free Variational Inference using Policy Search". ICML. 2018.

## Hierarchical Motherships for Space Exploration

Scope: Master's thesis, Bachelor's thesis
Start: ASAP
Topic: TLDR; Learning a partitioning of the search space based on a hierarchy of Gaussian mixture models.

Background: Exploring the search space is arguable the most challenging problem for global optimization and multimodal variational inference, and critical for achieving high quality solution. Our method for learning GMM approximations of intractable probability distributions---Variational Inference by Policy Search (VIPS, [1])---hence uses a sophisticated procedure to identify promising regions for dynamically adding and initializing new components to the mixture model. However, modes that have not been found at the early stage will likely also not be found at any later stage, as we only draw samples from the current approximation. While we could always evaluate additional samples from a high-entropic prior to maintain exploration, such procedure would be inefficient as it is unlikely to find modes by randomly sampling from an uniformed prior, in particular for high-dimensional search spaces. Instead, we want to investigate a technique that has been shown to be very efficient for global exploration, which involves optimization at different temperature levels (e.g. "simulated annealing" or parallel tempering Monte-Carlo). Using a temperature ladder beta_1 > ... > \beta_n = 1, we can construct several related optimization problems with target functions r_j(x)=1/beta_j r(x). By increasing the temperature, the different modes begin to overlap and cluster, making them easier to detect. In this thesis we will extend VIPS to learn a hierarchy of GMMs q^{\beta}(x) to approximate a target distribution at different temperature levels. To further improve efficiency of exploration and optimization, we will connect the GMMs at different temperature levels using "Mothership<->Drone"-relations, that is, every Gaussian component at temperature level $\beta_j$ (j>1) is connected to a component at level $\beta_{j-1}$ (its mothership), and is not allowed to leave its search space (in an information-theoretic sense). This additional constraint induces a partition-tree of the search space: if two components q_a^j and q_b^j at the same temperature level j are "non-overlapping", all their drones (and by transitivtiy there sub-drones) are also non-overlapping. This tree-partitioning does not only allow us to use bandit-stategies for a principled exploration-exploitation trade-off (similar to Monte-Carlo Tree Search), but also enables us to optimize non-overlapping GMMs independently enabling us to scale to larger mixture models.

Requirements
You should be driven and aim to turn this thesis into a high quality conference submission. To apply read [1] or [2] and arrange a zoom call with me (it is fine if you have many questions regarding the prior work).

• Programming with python (you will likely need to use TF2, so ideally you already have experience with it)
• Solid math background

References
[1] Oleg Arenz, Mingjun Zhong, Gerhard Neumann. "Trust-Region Variational Inference with Gaussian Mixture Models". JMLR. 2020.
[2] Oleg Arenz, Mingjun Zhong, Gerhard Neumann. "Efficient Gradient-Free Variational Inference using Policy Search". ICML. 2018.

## Causal inference of human behavior dynamics for physical Human-Robot Interactions

Scope: Master's thesis
Start: ASAP
Topic: In this thesis, we will study and develop ways of approximating an efficient behavior model of a human in close interaction with a robot. We will research the extension of our prior work on the graph-based representation of the human into a method that leverages multiple attention mechanisms to encode relative dynamics in the human body. Inspired by methods in causal discovery, we will treat the motion prediction problem as such. In essence, the need for a differentiable and accurate human motion model is essential for efficient tracking and optimization of HRI dynamics. You will test your method in the context of motion prediction, especially for HRI tasks like human-robot handovers, and you could demonstrate your results in a real world experiment.

Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your CV and transcripts.

Minimum knowledge

• Good knowledge of Python and/or C++;
• Good knowledge of Robotics;
• Good knowledge of Deep Learning frameworks, e.g, PyTorch

References

1. Li, Q., Chalvatzaki, G., Peters, J., Wang, Y., Directed Acyclic Graph Neural Network for Human Motion Prediction, 2021 IEEE International Conference on Robotics and Automation (ICRA).
2. Löwe, S., Madras, D., Zemel, R. and Welling, M., 2020. Amortized causal discovery: Learning to infer causal graphs from time-series data. arXiv preprint arXiv:2006.10833.
3. Yang, W., Paxton, C., Mousavian, A., Chao, Y.W., Cakmak, M. and Fox, D., 2020. Reactive human-to-robot handovers of arbitrary objects. arXiv preprint arXiv:2011.08961.

## Learning human models for safe human-robot handovers

Scope: Master's thesis
Advisor:Georgia Chalvatzaki, Puze Liu, Davide Tateo
Start: ASAP
Topic: In this thesis, we want to study ways of approximating the safety manifold of the human when interacting with a robot, particularly during object handovers. While most works define a hard-coded workspace representing the safety manifold of the human, those do not apply to most real-world interactions. We will record and explore the use of human-human demonstrations of handover actions to encode human motion and learn the human-body constraint manifold, and its evolution during the inetraction. These constraints and the human motion model can be used for constructing a safe action space to explore how the robot should approach and pass over objects to the human-receiver in a model-based learning setting. The initial tasks of the master thesis will include: i. literature revies of human-robot handovers, ii. recording of human-human demonstrations, iii. construction of simulation environment with human motions replay, iv. learn the constrained human workspace.

Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your CV and transcripts.

Minimum knowledge

• Good knowledge of Python and/or C++;
• Good knowledge of robotics
• Good knowledge of Reinforcement Learning;

Preferred knowledge

• Experience with recent deep RL methods;
• Experience with deep learning libraries;
• Experience with Pybullet simulator and Gym environment;

References

1. Vogt, David, et al. "A system for learning continuous human-robot interactions from human-human demonstrations." 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017.
2. Liu, Changliu, and Masayoshi Tomizuka. "Safe exploration: Addressing various uncertainty levels in human robot interactions." 2015 American Control Conference (ACC). IEEE, 2015.
3. Calinon, Sylvain, Irene Sardellitti, and Darwin G. Caldwell. "Learning-based control strategy for safe human-robot interaction exploiting task and robot redundancies." 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2010.
4. Sutanto, Giovanni, et al. "Learning Equality Constraints for Motion Planning on Manifolds." arXiv preprint arXiv:2009.11852 (2020).

## Learning the Low-level Policy for Robot Air Hockey

Scope: Bachelor's thesis/Master's thesis
Start: ASAP
Topic:

Robot air hockey is a complex, highly dynamic, and strictly constrained task. It is challenging for the robot to play. The playing policy generally can be decomposed into a two-level policy structure. In the high-level policy, the agent tries to determine the playing strategies, such as hitting and defending. The low-level policy determines the trajectory given the high-level strategies. In particular, the low-level policy is extremely challenging to obtain, as constraints in joint space and action space are critical to be satisfied. The objective of this thesis is to learn the low-level policies for different strategies. The safety constraints are satisfied based on our recent safe exploration algorithms. In addition, to improve the sample efficiency, recent advances that leverage the previous experience (e.g., GCSL[1], HER[2]) will also be explored.

Minimum knowledge

• Good knowledge of Python;
• Basic knowledge of Reinforcement Learning;
• Basic knowledge of Robotics.

Preferred knowledge

• Knowledge / Experience of recent Deep-RL methods;
• Experience with Pybullet simulator and gym environment;

References

1. Ghosh, Dibya, et al. "Learning to reach goals via iterated supervised learning." arXiv preprint arXiv:1912.06088 (2019).
2. Andrychowicz, Marcin, et al. "Hindsight experience replay." arXiv preprint arXiv:1707.01495 (2017).

## Learning Motion Filters for Skillful Manipulation

Scope: Master Thesis
Start: ASAP
Topic: Sample-based planners are extensively used for generating geometrically feasible motions. These motions are however insufficient for skillful manipulation like (i) balancing a glass of water, (ii) pouring water from one container to another, or (iii) avoiding workspace collisions in real-time. To address this limitation, we would like to use composable filters to refine the original motions for complying with certain task constraints. The filters would be learned from data by reinforcement learning.
Minimum knowledge

• Good Python programming skills
• Knowledge of robotics, and deep RL methods

## Genetic programming for Intepretable RL

Scope: Bachelor's thesis/Master's thesis
Start: ASAP
Topic:

Gradient-based methods, in particular in the scope of Deep Reinforcement Learning, have shown the ability to learn complex policies for high dimensional control tasks. However, gradient methods require differentiable policies, such as neural networks, which are often difficult to interpret. An alternative to deep neural networks is linear policies. These policies are based on a linear combination of hand-crafted features. This alternative has two drawbacks. Firstly, it lacks expressivity, limiting the performance of the method in complex tasks. Secondly, It requires expert knowledge to design useful and interpretable features. In robotics tasks, it's important to obtain policies with good performances but also to have a policy that can also be analyzed by an expert before the deployment, to prevent damages and avoid harming people. The objective of this thesis is to develop an algorithm that mixes gradient-based optimization (to learn policy parameters that can obtain good performances) and genetic programming to learn the appropriate policy structure. The algorithm will be tested on standard reinforcement learning control tasks as well as on simulated robotics tasks.

Minimum knowledge

• Good Python programming skills.
• Basic knowledge of Reinforcement Learning;

Preferred knowledge

• Knowledge of genetic algorithms;
• Knowledge of recent Deep RL methodologies;

Accepted candidate will

• Learn the basic of interpretable RL by looking at the existent litterature;
• Implement the proposed algorithm framework;
• Test the developed algortithm in simulated environments, comparing the results with already available state of the art methods;

## Learning distance metrics for Interpretable RL

Scope: Bachelor's thesis/Master's thesis
Start: ASAP
Topic: Deep Reinforcement Learning is a powerful tool that can learn policies for complex control tasks. However, the neural approximators are difficult to understand and evaluate, and they give rise to many safety concerns when deploying the agent in the real world. These concerns are particularly relevant for robotics: it's important to have an interpretable policy that can be analyzed to ensure that the robot will not cause damage or harm people. One important class of interpretable policies is based on state prototypes, i.e., specific states selected as salient ones. These policies select an action by: 1) searching for the closest prototype to the current state 2) applying the prototype policy, which can be as simple as a constant action, or a linear policy. This family of policies can be seen as a computer program, composed of a set of if-clauses: "if close to state x apply action a". However, to be truly interpretable, the notion of "closeness" has to be properly defined. The objective of this thesis is to learn from data an appropriate closeness metric and show that can generate interpretable policies that still maintain performance comparable to the Deep learning approaches. The candidate will work on the existing code base, extending the current approach to work with image-based robotic tasks, such as car racing and air hockey.

Minimum knowledge

• Good Python programming skills.
• Basic knowledge of Reinforcement Learning;

Preferred knowledge

• Knowledge of Neural networks and Autoencoders;
• Knowledge of recent Deep RL methodologies;

Accepted candidate will

• Learn the basic of the proposed algorithm by looking at the existent codebase;
• Implement a metric learning framework using Autoencoders;
• Test the developed algortithm in simulated environments, particularly considering images as input;

## Incorporating First and Second Order Mental Models for Human-Robot Cooperative Manipulation Under Partial Observability

Scope: Master Thesis
Start: ASAP

The ability to model the beliefs and goals of a partner is an essential part of cooperative tasks. While humans develop theory of mind models for this aim already at a very early age [1] it is still an open question how to implement and make use of such models for cooperative robots [2,3,4]. In particular, in shared workspaces human robot collaboration could potentially profit from the use of such models e.g. if the robot can detect and react to planned human goals or a human's false beliefs during task execution. To make such robots a reality, the goal of this thesis is to investigate the use of first and second order mental models in a cooperative manipulation task under partial observability. Partially observable Markov decision processes (POMDPs) and interactive POMDPs (I-POMDPs) [5] define an optimal solution to the mental modeling task and may provide a solid theoretical basis for modelling. The thesis may also compare related approaches from the literature and setup an experimental design for evaluation with the bi-manual robot platform Kobo.

Highly motivated students can apply by sending an e-mail expressing your interest to attaching your CV and transcripts.

References:

1. Wimmer, H., & Perner, J. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception (1983)
2. Sandra Devin and Rachid Alami. An implemented theory of mind to improve human-robot shared plans execution (2016)
3. Neil Rabinowitz, Frank Perbet, Francis Song, Chiyuan Zhang, SM Ali Eslami,and Matthew Botvinick. Machine theory of mind (2018)
4. Connor Brooks and Daniel Szafir. Building second-order mental models for human-robot interaction. (2019)
5. Prashant Doshi, Xia Qu, Adam Goodie, and Diana Young. Modeling recursive reasoning by humans using empirically informed interactive pomdps. (2010)

## Discovering neural parts in objects with invertible NNs for robot grasping

Scope: Master Thesis
Start: ASAP
Topic:

In this thesis, we will investigate the use of 3D primitive representations in objects using Invertible Neural Networks (INNs). Through INNs we can learn the implicit surface function of the objects and their mesh. Apart from extracting the object’s shape, we can parse the object into semantically interpretable parts. In our work our main focus will be to segment the parts in objects that are semantically related to object affordances. Moreover, the implicit representation of the primitive can allow us to compute directly the grasp configuration of the object, allowing grasp planning. Interested students are expected to have experience with Computer Vision and Deep Learning, but also know how to program in Python using DL libraries like PyTorch.

The thesis will be co-supervised by Despoina Paschalidou (Ph.D. candidate at the Max Planck Institute for Intelligent Systems and the Max Planck ETH Center for Learning Systems). Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your CV and transcripts.

References:

1. Paschalidou, Despoina, Angelos Katharopoulos, Andreas Geiger, and Sanja Fidler. "Neural Parts: Learning expressive 3D shape abstractions with invertible neural networks." arXiv preprint arXiv:2103.10429 (2021).
2. Karunratanakul, Korrawe, Jinlong Yang, Yan Zhang, Michael Black, Krikamol Muandet, and Siyu Tang. "Grasping Field: Learning Implicit Representations for Human Grasps." arXiv preprint arXiv:2008.04451 (2020).
3. Chao, Yu-Wei, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang et al. "DexYCB: A Benchmark for Capturing Hand Grasping of Objects." arXiv preprint arXiv:2104.04631 (2021).
4. Do, Thanh-Toan, Anh Nguyen, and Ian Reid. "Affordancenet: An end-to-end deep learning approach for object affordance detection." In 2018 IEEE international conference on robotics and automation (ICRA), pp. 5882-5889. IEEE, 2018.

## Deep Articulation Prediction

Scope: Master Thesis
Advisor: Julen Urain De Jesus, Puze Liu, Georgia Chalvatzaki
Start: ASAP
Topic:

In robotics, we deal with the problem of solving complex task planning problems in highly unstructured environments. While, in the last years, end-to-end learning algorithms have been proposed to solve these problems, the lack of clear abstractions to define policies seems a bottleneck for generalization of the learned skills. In this project. We consider that a proper understanding of the objects with which the workspace is composed could help the robot obtain better generalization properties.

This project deals with the problem of predicting the properties of articulated objects. Given a RGB+D image of an scene, a robotics oriented perception system should be able to extract relevant object-centric features, such as axis of rotation, handle position, objects position or size.

The student is expected to train Deep Learning models that given a big supervised dataset of scenes, he/she should train to model to predict (1) Where the relevant objects are in the scene and (2) Which are the features of these objects. The master thesis is oriented to students with high coding skills and strong knowledge working with Pytorch. Additionally, the student should have interested on Computer Vision and working with networks that deals with RGB+D data, such as CNN or PointNet.

Highly motivated students can apply by sending an e-mail expressing your interest to , or , attaching your CV and transcripts.

References:

1. Jain, Ajinkya and Lioutikov, Rudolf and Chuck, Caleb and Niekum, Scott. "ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory" (2021).
2. Li, Xiaolong, et al. "Category-level articulated object pose estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020).
3. Mo, Kaichun and Guibas, Leonidas and Mukadam, Mustafa and Gupta, Abhinav and Tulsiani, Shubham. "Where2Act: From Pixels to Actions for Articulated 3D Objects" (2021)

## Cross-platform Benchmark of Robot Grasp Planning

Scope: Master Thesis
Start: ASAP
Topic:

Grasp planning is one of the most challenging tasks in robot manipulation. Apart from perception ambiguity, the grasp robustness and the successful execution rely heavily on the dynamics of the robotic hands. The student is expected to research and develop benchmarking environments and evaluation metrics for grasp planning. The development in simulation environments as ISAAC Sim and Gazebo will allow us to integrate and evaluate different robotic hands for grasping a variety of everyday objects. We will evaluate grasp performance using different metrics (e.g., object-category-wise, affordance-wise, etc.), and finally, test the sim2real gap when transferring such approaches from popular simulators to real robots. The student will have the chance to work with different robotic hands (Justin hand, PAL TIAGo hands, Robotiq gripper, Panda gripper, etc.) and is expected to transfer the results to at least two robots (Rollin’ Justin at DLR and TIAGo++ at TU Darmstadt). The results of this thesis are intended to be made public (both the data and the benchmarking framework) for the benefit of the robotics community. As this thesis is offered in collaboration with the DLR institute of Robotics and Mechatronics in Oberpfaffenhofen near Munich, the student is expected to work in DLR for a period of 8-months for the thesis. On-site work at the premises of DLR can be expected but not guaranteed due to COVID-19 restrictions. A large part of the project can be carried out remotely.

Highly motivated students can apply by sending an e-mail expressing your interest to and , attaching your CV and transcripts.

References:

1. Collins, Jack, Shelvin Chand, Anthony Vanderkop, and David Howard. "A Review of Physics Simulators for Robotic Applications." IEEE Access (2021).
2. Bekiroglu, Y., Marturi, N., Roa, M. A., Adjigble, K. J. M., Pardi, T., Grimm, C., ... & Stolkin, R. (2019). Benchmarking protocol for grasp planning algorithms. IEEE Robotics and Automation Letters, 5(2), 315-322.

## AADD: Reinforcement Learning for Unbiased Autonomous Driving

Scope: Master's thesis
Advisor: Tuan Dam, Carlo D'Eramo, Joni Pajarinen
Start: ASAP
Topic: Applying reinforcement to autonomous driving is a promising but challenging research direction due to the high uncertainty and environmental conditions in the task. Efficient reinforcement learning is needed. For efficient reinforcement learning recent work has suggested solving the Bellman Optimality equation with Stability guarantees but unfortunately no guarantee for zero bias has been proposed in this context making reinforcement learning susceptible to getting stuck in dangerous solutions. In this work we formulate the Bellman equation into a Convex-Concave Saddle Point Problem and solve it using a new proposed Accelerated Primal-Dual Algorithm [3]. We will test the algorithm in benchmark problems and in an autonomous driving task such as the one shown on the right (video: where an efficient unbiased solution is needed.
[1] Ofir Nachum, Yinlam Chow, and Mohammad Ghavamzadeh. Path consistency learning in Tsallis entropy regularized mdps. arXiv preprint arXiv:1802.03501 , 2018.
[2] Dai, Bo, et al. "Sbeed: Convergent reinforcement learning with nonlinear function approximation." International Conference on Machine Learning. PMLR, 2018.
[3] Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm for general convex-concave saddle point problems. arXiv preprint arXiv:1803.01401 , 2018

## Above Average Decision Making Under Uncertainty

Scope: Master's thesis
Start: ASAP
Topic: Google Deepmind recently showed how Monte Carlo Tree Search (MCTS) combined with neural networks can be used to play Go on a super-human level. However, one disadvantage of MCTS is that the search tree explodes exponentially with respect to the planning horizon. In this Master thesis the student will integrate the advantages of MCTS, that is, optimistic decision making into a policy representation that is limited in size with respect to the planning horizon. The outcome will be an approach that can plan further into the future. The application domain will include partially observable problems where decisions can have far reaching consequences.

## Approximate Inference Methods for Stochastic Optimal Control

Scope: Master's thesis
Start: ASAP
Topic:

Recent work has presented a control-as-inference formulation that frames optimal control as input estimation. The linear Gaussian assumption can be shown to be equivalent to the LQR solution, while approximate inference through linearization can be viewed as a Gauss–Newton method, similar to popular trajectory optimization methods (e.g. iLQR). However, the linearization approximation limits both the tolerable environment stochasticity and exploration during inference.

The aim of this thesis is to use alternative approximate inference methods (e.g. quadrature, monte carlo, variational), and investigate the benefits to stochastic optimal control and trajectory optimization. Ideally, prospective students are interested in optimal control, approximate inference methods and model-based reinforcement learning.

## Development of a Hierarchical Reinforcement Learning Toolbox

Scope: Bachelor's thesis/Project
Start: ASAP
Topic: Hierarchical Reinforcement Learning (HRL) is the field of Reinforcement Learning (RL) that considers structured agents. In this field, a high-level task is decomposed in simpler subtasks. The resulting control policy is represented as a hierarchy of policy, where each policy solves a subtask. While the original literature of HRL focus on how is possible to exploit domain knowledge and structured exploration to speed-up the learning, the more recent approaches, based on Deep Learning, focus on using the hierarchical structure to solve tasks that cannot be solved, or that are difficult to learn, using classical Deep RL approaches. While classical HRL approaches are particularly well suited for finite state-action space MDPs, the more recent Deep HRL approaches can work in complex robotic tasks with continuous state and actions pairs.

One major drawback of the recent literature, is that the Deep HRL approaches shares one of the major issues of the "flat" Deep RL: indeed, the resulting policy is difficult to be interpreted by humans and thus cannot be trusted in safety-critical applications, as we cannot analyze and predict the global behavior. Another major drawback of Deep HRL algorithms is that it is difficult to insert prior knowledge of the environment in the policy structure, making even more difficult to apply these kinds of algorithms in real-world scenarios.

To solve these issues, we propose a novel HRL framework, inspired by control theory, where the design of the hierarchical agent is performed using block diagrams. This framework simplifies the design of hierarchical agents and proposes a different paradigm for HRL: we build structured agents that do not execute of a policy following the stack principle i.e., functions calls, but instead are composed by a set of different parallel controllers. More details about this framework can be found here.

The objective of this thesis is to simplify the design of hierarchical agents using the above-mentioned framework by implementing graphical tools to define easily the structure of the agent and analyze the behavior of the agent while interacting with the environment. Also, we need to improve the existing codebase by refactoring interfaces and implementing new features.

Minimum knowledge

• Good Python programming skills.

Preferred knowledge

• Knowledge of Python graphical and graph libraries;
• Basic knowledge of Reinforcement Learning;
• Knowledge of recent Deep RL methodologies;

Accepted candidate will

• Learn the basic of the proposed framework by looking at the existent codebase;
• Implement graphical tools to design and analyze Hierarchical Reinforcement Learning Agents;
• Refactor the currently existing framework to design HRL agents;
• Add new functionalities to the Hierarchical Reinforcement Learning Framework;
• Test the developed framework in toy problems or, optionally, on real robots;
• optionally, implement some standard Hierarchical Reinforcement Learning algorithms.

## How can we learn to segment tasks in Humans?

Scope: Master's thesis | Bachelor's thesis
Start: ASAP
Topic:

Object Segmentation algorithms have proved that segmentating data with respect of the information they have is possible. This opens the door to considering time related data like trajectories or videos. Been able to segment the movements of the human with respect of the different actions they are doing will provide a powerful method to undetrstand human tasks, predict them and hopefully mimic it with a robot. In this project it is expected to study different algorithms for Unsupervised segmentation of human actions and study how well the learned models can predict human motion.

## Interactive dance: at the interface of high-level reactive robotic control and human-robot interactions.

Scope: Master's thesis
Advisor: Vincent Berenz (a collaborator at Tübingen at the at the Max Planck Institute for Intelligent Systems)
Start: ASAP
Topic:

Robotic scripted dance is common. One the other hand, interactive dance, in which the robot uses runtime sensory information to continuously adapt its moves to those of its (human) partner, remains challenging. It requires integration of together various sensors, action modalities and cognitive processes. The selected candidate objective will be to develop such an interactive dance, based on the software suit for simultaneous perception and motion generation our department built over the years. The target robot on which the dance will be applied is the wheeled robot Softbank Robotics Pepper. This master thesis is with the Max Planck Institute for Intelligent Systems and is located in Tuebingen. More information: https://am.is.tuebingen.mpg.de/jobs/master-thesis-interactive-dance-performed-by-sofbank-robotics-pepper

## Investigating Memory Models for Deep Reinforcement Learning in POMDPs

Scope: Master's Thesis, Bachelor's thesis
Start: ASAP
Topic: Reinforcement Learning under partial observability of the true system state, albeit having great potential, is still an open problem. A critical ingredient for recent model-free RL approaches in partially observable domains is the right choice of a memory model that is limited to recurrent neural networks or full histories [1][2]. The goal of this project is to investigate and compare the performance of different models, including ones used in Computer Vision or Natural Language Processing (e.g. Recurrent Ladder Networks [3]), in partially observable domains to gain new insights. The student will compare the performance of the memory models in selected tasks in simulation. If desired, the student also has to chance to test a few of the memory models in a real robotic task of playing Mikado.

Minimum knowledge

• Good Python programming skills.

Preferred knowledge

• Knowledge of deep neural network, deep recurrent neural networks
• Basic knowledge of Reinforcement Learning, POMDP, Memory Representation in POMDP
• Knowledge of recent Deep RL methodologies;

[1] Deep recurrent q-learning for partially observable mdps, Hausknecht et al. https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/download/11673/11503
[2] Learning deep neural network policies with continuous memory states, Zhang et al.https://ieeexplore.ieee.org/iel7/7478842/7487087/07487174.pdf

## Learning a Friction Hystersis with MOSAIC

Scope: Master's thesis, Bachelor thesis
Start: ASAP
Topic: Inspired by results in neuroscience, especially in the Cerebellum, Kawato & Wolpert introduced the idea of the MOSAIC (modular selection and identification for control) learning architecture. In this architecture, local forward models, i.e., models that predict future states and events, are learned directly from observations. Based on the prediction accuracy of these models, corresponding inverse models can be learned. In this thesis, we want to focus on the problem of learning to control a robot system with a hysteresis in its friction.

## Learning Stochastic Nonlinear Dynamical Systems with the Koopman Operator

Scope: Master's Thesis
Start: Anytime
Topic: Model-based Reinforcement Learning for robotics typically requires learning nonlinear stochastic dynamical systems. This project aims to combine the Koopman operator, bayesian machine learning and neural network models to represent these systems as linear gaussian dynamical systems in some high-dimensional embedding. See this write up for more details.

## Optimal Sampling for Self-Paced Reinforcement Learning

Scope: Master's Thesis
Start: ASAP
Topic: The idea of gradually learning to accomplish a complicated task via a guiding sequence of intermediate ones - referred to as Curriculum Learning - has shown great experimental success. The goal of this project is to investigate a recent take on Curriculum Learning in the domain of Reinforcement Learning, which interprets it as a form of Expectation Maximization. More precisely, the goal is to push the capabilities of this formulation by using advanced sampling methods to sample tasks for learning instead of simple approximations that have been used so far. The ideal candidate:

• is knowledged in Reinforcement Learning (as this is the basis for the project)
• has a basic understanding of Mixed-Integer Programming (or is not afraid of diving into this domain)
• has basic knowledge in the domain of Variational Inference

## Reinforcement Learning for Architectural Combinatorial Optimization

Scope: Master's thesis
Advisors: Boris Belousov, Georgia Chalvatzaki, Bastian Wibranek
Start: ASAP
Topic: Many real-world problems can be reduced to combinatorial optimization over graphs. For example, search for a combination of elements that produce a desired structure satisfying given design constraints, such as load bearing or form matching, is a ubiquitous problem in architecture. Commonly, heuristic search algorithms are employed which require expert input from the architect to guide the search. This thesis will investigate optimization approaches based on graph embedding techniques, such as graph neural networks, to improve the state of the art on combinatorial optimization in the architectural domain. The thesis will involve collaboration with the Digital Design Unit from FB Architektur.

## Structured Perception for Robotic Manipulation

Scope: Bachelor's / Master's Thesis
Start: Anytime
Topic: For learning in robotic manipulation, there are two cultures: 'end-to-end' vs inductive biases. While the former is purely data-driven, the latter incorporates ideas from computer vision and visual servoing for more interpretable and sample-efficient performance. This is an open-ended project aimed at investigating novel frameworks for perception-based manipulation that combine 'structure' (computer vision) with learning.
The ideal candidate:

• has knowledge of both robotics and (geometric) computer vision
• is interested in working on real robotic manipulators
• can write clean, maintainable software

## Targeted Exploration Using Value Bounds

Scope: Master's thesis