Currently Offered Topics / Aktuelle angebotene Themen fuer Abschlussarbeiten

We offer these current topics directly for Bachelor and Master students at TU Darmstadt who can feel free to DIRECTLY contact the thesis advisor if you are interested in one of these topics. Excellent external students from another university may be accepted but are required to first email Jan Peters before contacting any other lab member for a thesis topic. Note that we cannot provide funding for any of these theses projects.

We highly recommend that you do either our robotics and machine learning lectures (Robot Learning, Statistical Machine Learning) or our colleagues (Grundlagen der Robotik, Probabilistic Graphical Models and/or Deep Learning). Even more important to us is that you take both Robot Learning: Integrated Project, Part 1 (Literature Review and Simulation Studies) and Part 2 (Evaluation and Submission to a Conference) before doing a thesis with us.

In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please DIRECTLY contact the thesis advisor if you are interested in one of these topics. When you contact the advisor, it would be nice if you could mention (1) WHY you are interested in the topic (dreams, parts of the problem, etc), and (2) WHAT makes you special for the projects (e.g., class work, project experience, special programming or math skills, prior work, etc.). Supplementary materials (CV, grades, etc) are highly appreciated. Of course, such materials are not mandatory but they help the advisor to see whether the topic is too easy, just about right or too hard for you.

Only contact *ONE* potential advisor at the same time! If you contact more a second one without first concluding discussions with the first advisor (i.e., decide for or against the thesis with her or him), we may not consider you at all. Only if you are super excited for at most two topics send an email to both supervisors, so that the supervisors are aware of the additional interest.

FOR FB16+FB18 STUDENTS: Students from other depts at TU Darmstadt (e.g., ME, EE, IST), you need an additional formal supervisor who officially issues the topic. Please do not try to arrange your home dept advisor by yourself but let the supervising IAS member get in touch with that person instead. Multiple professors from other depts have complained that they were asked to co-supervise before getting contacted by our advising lab member.

NEW THESES START HERE

Self-Supervised Correspondence Learning for Cross-Domain Imitation Learning

Scope: Master thesis
Advisor: An Thai Le
Added: 2022-05-21
Start: October 2022
Topic: Imitation Learning has achieved huge successes over decades in the acquisition of new motor skills from expert demonstrations. However, most of these successes assume expert demonstrations lie in the same domain as the learner, which hinders the application of Imitation Learning in a variety of cases, e.g. teaching skills from videos. Recently, there is a body of works addressing the correspondence problem by learning directly the mapping between expert and learner state-action spaces [1, 2] or a embodiment-agnostic task-progress indicator [3] from supervised datasets. However, such supervised datasets could be hard or impossible to collect.

This project explores various techniques and insights from Self-Supervised Learning to design a method that learns the correspondence without human supervision. A promising direction could be applying Optimal Transport cost [4] to measure the similarity of state-action spaces having different supports in the self-supervised setting.

The ideal candidate for this thesis has a good programming skills in Python as well as solid knowledge of (deep) RL algorithms/techniques.

[1] Raychaudhuri, Dripta S., et al. "Cross-domain imitation from observations." International Conference on Machine Learning. PMLR, 2021.
[2] Kim, Kuno, et al. "Domain adaptive imitation learning." International Conference on Machine Learning. PMLR, 2020.
[3] Zakka, Kevin, et al. "Xirl: Cross-embodiment inverse reinforcement learning." Conference on Robot Learning. PMLR, 2022.
[4] Fickinger, Arnaud, et al. "Cross-Domain Imitation Learning via Optimal Transport." arXiv preprint arXiv:2110.03684 (2021).

Learning 3D Inverted Pendulum Stabilization

Scope: Master thesis
Advisor: Pascal Klink, Kai Ploeger
Added: 2022-05-18
Start: End of 2022
Topic: The concept of starting small is widely applied in reinforcement learning in order to improve learning speed and -stability of autonomous agents [1, 2, 3]. This project focuses on applying such a concept to the task of controlling a 3D inverted pendulum with a Barrett WAM robot (on the right - in this image connected to a badminton racket). More precisely, the robot is tasked to follow increasingly complex trajectories with its endeffector while simultaneously stabilizing a pole that is attached to said endeffector. The generation of increasingly complex target trajectories will be performed by a novel algorithm that has been developed at IAS. The design of the overall learning agent will first be done in a simulator and then transferred to the real system. The transfer to the real system also comprises the design and assembly of the physical 3D inverted pendulum.

The ideal candidate for this thesis has a solid knowledge of robotics and simulators, good programming skills as well as knowledge of (deep) RL algorithms/techniques.

[1] Andrychowicz, OpenAI: Marcin, et al. "Learning dexterous in-hand manipulation." IJRR, 2020.
[2] Silver, David, et al. "Mastering the game of go without human knowledge." Nature, 2017.
[3] Rudin, Nikita, et al. "Learning to walk in minutes using massively parallel deep reinforcement learning." CoRL, 2022.

Adaptive Human-Robot Interactions with Human Trust Maximization

Scope: Master thesis
Advisor: Kay Hansel, Georgia Chalvatzaki
Added: 2022-03-18
Start: April
Topic: Building trust between humans and robots is a major goal of Human-Robot Interaction (HRI). Usually, trust in HRI has been associated with risk aversion: a robot is trustworthy when its actions do not put the human at risk. However, we believe that trust is a bilateral concept that governs the behavior and participation in the collaborative tasks of both interacting parties. On the one hand, the human has to trust the robot about its actions, e.g., delivering the requested object, acting safely, and interacting in a reasonable time horizon. On the other hand, the robot should trust the human regarding their actions, e.g., have a reliable belief about the human's next action that would not lead to task failure; a certainty in the requested task. However, providing a computational model of trust is extremely challenging.
Therefore, this thesis explores trust maximization as a partially observable problem, where trust is considered as a latent variable that needs to be inferred. This consideration results in a dual optimization problem for two reasons: (i) the robot behavior must be optimized to maximize the human's latent trust distribution; (ii) an optimization of the human's prediction model must be performed to maximize the robot's trust. To address this challenging optimization problem, we will rely on variational inference and metrics like Mutual Information for optimization.
Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your letter of motivation and possibly your CV.

Requirements:

  • Good knowledge of Python and/or C++;
  • Good knowledge in Robotics and Machine Learning;
  • Good knowledge of Deep Learning frameworks, e.g, PyTorch;

References:
[1] Xu, Anqi, and Gregory Dudek. "Optimo: Online probabilistic trust inference model for asymmetric human-robot collaborations." ACM/IEEE HRI, IEEE, 2015;
[2] Kwon, Minae, et al. "When humans aren’t optimal: Robots that collaborate with risk-aware humans." ACM/IEEE HRI, IEEE, 2020;
[3] Chen, Min, et al. "Planning with trust for human-robot collaboration." ACM/IEEE HRI, IEEE, 2018;
[4] Poole, Ben et al. “On variational bounds of mutual information”. ICML, PMLR, 2019.

Policy Learning for Tactile Insertion

Scope: Master thesis
Advisor: Niklas Funk
Added: 2022-02-01
Start: March / April
Topic: Solving tight assembly tasks such as putting a plug into a socket or creating a figure from LEGO elements remains a challenging problem in robotics. This is mainly due to the problem's partial observability besides its requirements in precision. While recent works [1,2] present approaches based on force/torque sensing, in this thesis, we want to investigate how vision-based tactile sensors can be exploited in these scenarios [3].
As insertion tasks are difficult to simulate, the thesis offers the possibility to work on a real robotic manipulator (the Franka Panda robot) using our custom-built vision-based tactile sensors. While the sensors are lately becoming increasingly popular, the main goal of this thesis is to investigate how to best integrate them into a reactive policy for solving the aforementioned tasks.
Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your letter of motivation and possibly your CV.

Minimum Requirements:

  • Good knowledge of C++/Python
  • Good knowledge of robotics and machine learning
  • Willingness to work on a real robotic system

Preferred Knowledge:

  • Experience with deep learning libraries (e.g. Pytorch)
  • Experience with image processing
  • Experience with reinforcement learning

References:
[1] InsertionNet - A Scalable Solution for Insertion; Oren Spector and Dotan Di Castro
[2] Learning Robotic Assembly from CAD; Garrett Thomas, Pieter Abbeel et al.
[3] Tactile-RL for Insertion: Generalization to Objects of Unknown Geometry; Siyuan Dong, Alberto Rodriguez et al.

Statistical Model-based Reinforcement Learning

Scope: Master thesis
Advisor: Joe Watson
Added: 2022-01-25
Start: ASAP
Topic: Revisit the idea of Gaussian processes for data-driven control. The project is well defined, and similar to previous works such as PILCO and guided policy search. The student will learn about approximate inference, optimal control and Gaussian processes. Preferably, applicants have taken the statistical machine learning, robot learning and / or robot learning integrated project courses.

The goals of this thesis are:

  • Use Gaussian processes and approximate inference for model-based reinforcement learning
  • Implement and evaluate algorithms on real robotic systems

If you are interested in this thesis, please send an e-mail with your CV and transcripts to

Requirements

  • Python software development
  • Patience for research on real robotic systems

References

  1. PILCO: A Model-Based and Data-Efficient Approach to Policy Search, Diesenroth et al. (2011)
  2. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics, Levine et al. (2014)
  3. Stochastic Control as Approximate Input Inference, Watson et al. (2021)

Hierarchical VADAM

Scope: Master's thesis, Bachelor's thesis
Advisor: Oleg Arenz
Added: 2021-11-04
Start: ASAP
Topic: TLDR; Learning a mixture of mean-field Gaussians by combining VIPS with VADAM

ADAM is a popular method for minimizing a loss function in deep learning. Khan et al. [1] showed, that a slight modification of ADAM (called VADAM) can be applied to find the parameters of a Gaussian (with diagonal covariance), to approximate the posterior in Bayesian Inference (with NN loss functions). VIPS [2] is a method to optimize a Gaussian Mixture model for better, multimodal, posterior approximations, by enabling us to optimize the individual Gaussian components independently. However, VIPS learns full covariance matrices (using the MORE algorithm) and thus does not scale to very high-dimensional problems, e.g. neural network parameters. In this thesis, you will replace the MORE-optimizer within the VIPS framework by VADAM to efficiently learn mixture of mean-field Gaussians for high-dimensional, multi-modal variational inference. We will likely use our implementation of VIPS, which is written in Tensorflow 2.

Requirements
The topic is suitable for a Bachelor thesis as it should be relatively straightforward to implement (if you are familar with python/TF2). However, the topic also has a lot of potential to be useful for a wide range of audience, and we should aim to publish the result, which would require extra effort from you. To apply, first try to grasp the main insights and the mechanics of VADAM and VIPS ([1] and [2]) and arrange a meeting with me.

References
[1] Khan, Mohammad, et al. "Fast and scalable bayesian deep learning by weight-perturbation in adam." ICML 2018.
[2] Oleg Arenz, Mingjun Zhong, Gerhard Neumann. "Efficient Gradient-Free Variational Inference using Policy Search". ICML. 2018.

Causal inference of human behavior dynamics for physical Human-Robot Interactions

Scope: Master's thesis
Advisor:Georgia Chalvatzaki, Kay Hansel
Added: 2021-10-16
Start: ASAP
Topic: In this thesis, we will study and develop ways of approximating an efficient behavior model of a human in close interaction with a robot. We will research the extension of our prior work on the graph-based representation of the human into a method that leverages multiple attention mechanisms to encode relative dynamics in the human body. Inspired by methods in causal discovery, we will treat the motion prediction problem as such. In essence, the need for a differentiable and accurate human motion model is essential for efficient tracking and optimization of HRI dynamics. You will test your method in the context of motion prediction, especially for HRI tasks like human-robot handovers, and you could demonstrate your results in a real world experiment.

Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your a letter of motivation and possibly your CV.

Minimum knowledge

  • Good knowledge of Python and/or C++;
  • Good knowledge of Robotics;
  • Good knowledge of Deep Learning frameworks, e.g, PyTorch

References

  1. Li, Q., Chalvatzaki, G., Peters, J., Wang, Y., Directed Acyclic Graph Neural Network for Human Motion Prediction, 2021 IEEE International Conference on Robotics and Automation (ICRA).
  2. Löwe, S., Madras, D., Zemel, R. and Welling, M., 2020. Amortized causal discovery: Learning to infer causal graphs from time-series data. arXiv preprint arXiv:2006.10833.
  3. Yang, W., Paxton, C., Mousavian, A., Chao, Y.W., Cakmak, M. and Fox, D., 2020. Reactive human-to-robot handovers of arbitrary objects. arXiv preprint arXiv:2011.08961.

Genetic programming for Intepretable RL

Scope: Bachelor's thesis/Master's thesis
Advisor: Davide Tateo, Riad Akrour
Added: 2021-06-30
Start: ASAP
Topic:

Gradient-based methods, in particular in the scope of Deep Reinforcement Learning, have shown the ability to learn complex policies for high dimensional control tasks. However, gradient methods require differentiable policies, such as neural networks, which are often difficult to interpret. An alternative to deep neural networks is linear policies. These policies are based on a linear combination of hand-crafted features. This alternative has two drawbacks. Firstly, it lacks expressivity, limiting the performance of the method in complex tasks. Secondly, It requires expert knowledge to design useful and interpretable features. In robotics tasks, it's important to obtain policies with good performances but also to have a policy that can also be analyzed by an expert before the deployment, to prevent damages and avoid harming people. The objective of this thesis is to develop an algorithm that mixes gradient-based optimization (to learn policy parameters that can obtain good performances) and genetic programming to learn the appropriate policy structure. The algorithm will be tested on standard reinforcement learning control tasks as well as on simulated robotics tasks.

Minimum knowledge

  • Good Python programming skills.
  • Basic knowledge of Reinforcement Learning;

Preferred knowledge

  • Knowledge of genetic algorithms;
  • Knowledge of recent Deep RL methodologies;

Accepted candidate will

  • Learn the basic of interpretable RL by looking at the existent litterature;
  • Implement the proposed algorithm framework;
  • Test the developed algortithm in simulated environments, comparing the results with already available state of the art methods;

Learning distance metrics for Interpretable RL

Scope: Bachelor's thesis/Master's thesis
Advisor: Davide Tateo, Riad Akrour
Added: 2021-06-30
Start: ASAP
Topic: Deep Reinforcement Learning is a powerful tool that can learn policies for complex control tasks. However, the neural approximators are difficult to understand and evaluate, and they give rise to many safety concerns when deploying the agent in the real world. These concerns are particularly relevant for robotics: it's important to have an interpretable policy that can be analyzed to ensure that the robot will not cause damage or harm people. One important class of interpretable policies is based on state prototypes, i.e., specific states selected as salient ones. These policies select an action by: 1) searching for the closest prototype to the current state 2) applying the prototype policy, which can be as simple as a constant action, or a linear policy. This family of policies can be seen as a computer program, composed of a set of if-clauses: "if close to state x apply action a". However, to be truly interpretable, the notion of "closeness" has to be properly defined. The objective of this thesis is to learn from data an appropriate closeness metric and show that can generate interpretable policies that still maintain performance comparable to the Deep learning approaches. The candidate will work on the existing code base, extending the current approach to work with image-based robotic tasks, such as car racing and air hockey.

Minimum knowledge

  • Good Python programming skills.
  • Basic knowledge of Reinforcement Learning;

Preferred knowledge

  • Knowledge of Neural networks and Autoencoders;
  • Knowledge of recent Deep RL methodologies;

Accepted candidate will

  • Learn the basic of the proposed algorithm by looking at the existent codebase;
  • Implement a metric learning framework using Autoencoders;
  • Test the developed algortithm in simulated environments, particularly considering images as input;

Incorporating First and Second Order Mental Models for Human-Robot Cooperative Manipulation Under Partial Observability

Scope: Master Thesis
Advisor: Dorothea Koert, Joni Pajarinen
Added: 2021-06-08
Start: ASAP

The ability to model the beliefs and goals of a partner is an essential part of cooperative tasks. While humans develop theory of mind models for this aim already at a very early age [1] it is still an open question how to implement and make use of such models for cooperative robots [2,3,4]. In particular, in shared workspaces human robot collaboration could potentially profit from the use of such models e.g. if the robot can detect and react to planned human goals or a human's false beliefs during task execution. To make such robots a reality, the goal of this thesis is to investigate the use of first and second order mental models in a cooperative manipulation task under partial observability. Partially observable Markov decision processes (POMDPs) and interactive POMDPs (I-POMDPs) [5] define an optimal solution to the mental modeling task and may provide a solid theoretical basis for modelling. The thesis may also compare related approaches from the literature and setup an experimental design for evaluation with the bi-manual robot platform Kobo.

Highly motivated students can apply by sending an e-mail expressing your interest to attaching your CV and transcripts.

References:

  1. Wimmer, H., & Perner, J. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception (1983)
  2. Sandra Devin and Rachid Alami. An implemented theory of mind to improve human-robot shared plans execution (2016)
  3. Neil Rabinowitz, Frank Perbet, Francis Song, Chiyuan Zhang, SM Ali Eslami,and Matthew Botvinick. Machine theory of mind (2018)
  4. Connor Brooks and Daniel Szafir. Building second-order mental models for human-robot interaction. (2019)
  5. Prashant Doshi, Xia Qu, Adam Goodie, and Diana Young. Modeling recursive reasoning by humans using empirically informed interactive pomdps. (2010)

Discovering neural parts in objects with invertible NNs for robot grasping

Scope: Master Thesis
Advisor: Georgia Chalvatzaki, Despoina Paschalidou
Added: 2021-05-19
Start: ASAP
Topic:

In this thesis, we will investigate the use of 3D primitive representations in objects using Invertible Neural Networks (INNs). Through INNs we can learn the implicit surface function of the objects and their mesh. Apart from extracting the object’s shape, we can parse the object into semantically interpretable parts. In our work our main focus will be to segment the parts in objects that are semantically related to object affordances. Moreover, the implicit representation of the primitive can allow us to compute directly the grasp configuration of the object, allowing grasp planning. Interested students are expected to have experience with Computer Vision and Deep Learning, but also know how to program in Python using DL libraries like PyTorch.

The thesis will be co-supervised by Despoina Paschalidou (Ph.D. candidate at the Max Planck Institute for Intelligent Systems and the Max Planck ETH Center for Learning Systems). Highly motivated students can apply by sending an e-mail expressing your interest to , attaching your a letter of motivation and possibly your CV.

References:

  1. Paschalidou, Despoina, Angelos Katharopoulos, Andreas Geiger, and Sanja Fidler. "Neural Parts: Learning expressive 3D shape abstractions with invertible neural networks." arXiv preprint arXiv:2103.10429 (2021).
  2. Karunratanakul, Korrawe, Jinlong Yang, Yan Zhang, Michael Black, Krikamol Muandet, and Siyu Tang. "Grasping Field: Learning Implicit Representations for Human Grasps." arXiv preprint arXiv:2008.04451 (2020).
  3. Chao, Yu-Wei, Wei Yang, Yu Xiang, Pavlo Molchanov, Ankur Handa, Jonathan Tremblay, Yashraj S. Narang et al. "DexYCB: A Benchmark for Capturing Hand Grasping of Objects." arXiv preprint arXiv:2104.04631 (2021).
  4. Do, Thanh-Toan, Anh Nguyen, and Ian Reid. "Affordancenet: An end-to-end deep learning approach for object affordance detection." In 2018 IEEE international conference on robotics and automation (ICRA), pp. 5882-5889. IEEE, 2018.

Deep Articulation Prediction

Scope: Master Thesis
Advisor: Julen Urain De Jesus, Puze Liu, Georgia Chalvatzaki
Added: 2021-04-23
Start: ASAP
Topic:

In robotics, we deal with the problem of solving complex task planning problems in highly unstructured environments. While, in the last years, end-to-end learning algorithms have been proposed to solve these problems, the lack of clear abstractions to define policies seems a bottleneck for generalization of the learned skills. In this project. We consider that a proper understanding of the objects with which the workspace is composed could help the robot obtain better generalization properties.

This project deals with the problem of predicting the properties of articulated objects. Given a RGB+D image of an scene, a robotics oriented perception system should be able to extract relevant object-centric features, such as axis of rotation, handle position, objects position or size.

The student is expected to train Deep Learning models that given a big supervised dataset of scenes, he/she should train to model to predict (1) Where the relevant objects are in the scene and (2) Which are the features of these objects. The master thesis is oriented to students with high coding skills and strong knowledge working with Pytorch. Additionally, the student should have interested on Computer Vision and working with networks that deals with RGB+D data, such as CNN or PointNet.

Highly motivated students can apply by sending an e-mail expressing your interest to , or , attaching your a letter of motivation and possibly your CV.

References:

  1. Jain, Ajinkya and Lioutikov, Rudolf and Chuck, Caleb and Niekum, Scott. "ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory" (2021).
  2. Li, Xiaolong, et al. "Category-level articulated object pose estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. (2020).
  3. Mo, Kaichun and Guibas, Leonidas and Mukadam, Mustafa and Gupta, Abhinav and Tulsiani, Shubham. "Where2Act: From Pixels to Actions for Articulated 3D Objects" (2021)

Cross-platform Benchmark of Robot Grasp Planning

Scope: Master Thesis
Advisor: Georgia Chalvatzaki, Daniel Leidner
Added: 2021-04-22
Start: ASAP
Topic:

Grasp planning is one of the most challenging tasks in robot manipulation. Apart from perception ambiguity, the grasp robustness and the successful execution rely heavily on the dynamics of the robotic hands. The student is expected to research and develop benchmarking environments and evaluation metrics for grasp planning. The development in simulation environments as ISAAC Sim and Gazebo will allow us to integrate and evaluate different robotic hands for grasping a variety of everyday objects. We will evaluate grasp performance using different metrics (e.g., object-category-wise, affordance-wise, etc.), and finally, test the sim2real gap when transferring such approaches from popular simulators to real robots. The student will have the chance to work with different robotic hands (Justin hand, PAL TIAGo hands, Robotiq gripper, Panda gripper, etc.) and is expected to transfer the results to at least two robots (Rollin’ Justin at DLR and TIAGo++ at TU Darmstadt). The results of this thesis are intended to be made public (both the data and the benchmarking framework) for the benefit of the robotics community. As this thesis is offered in collaboration with the DLR institute of Robotics and Mechatronics in Oberpfaffenhofen near Munich, the student is expected to work in DLR for a period of 8-months for the thesis. On-site work at the premises of DLR can be expected but not guaranteed due to COVID-19 restrictions. A large part of the project can be carried out remotely.

Highly motivated students can apply by sending an e-mail expressing your interest to and , attaching your a letter of motivation and possibly your CV.

References:

  1. Collins, Jack, Shelvin Chand, Anthony Vanderkop, and David Howard. "A Review of Physics Simulators for Robotic Applications." IEEE Access (2021).
  2. Bekiroglu, Y., Marturi, N., Roa, M. A., Adjigble, K. J. M., Pardi, T., Grimm, C., ... & Stolkin, R. (2019). Benchmarking protocol for grasp planning algorithms. IEEE Robotics and Automation Letters, 5(2), 315-322.

AADD: Reinforcement Learning for Unbiased Autonomous Driving

Scope: Master's thesis
Advisor: Tuan Dam, Carlo D'Eramo, Joni Pajarinen
Start: ASAP
Topic: Applying reinforcement to autonomous driving is a promising but challenging research direction due to the high uncertainty and environmental conditions in the task. Efficient reinforcement learning is needed. For efficient reinforcement learning recent work has suggested solving the Bellman Optimality equation with Stability guarantees but unfortunately no guarantee for zero bias has been proposed in this context making reinforcement learning susceptible to getting stuck in dangerous solutions. In this work we formulate the Bellman equation into a Convex-Concave Saddle Point Problem and solve it using a new proposed Accelerated Primal-Dual Algorithm [3]. We will test the algorithm in benchmark problems and in an autonomous driving task such as the one shown on the right (see video) where an efficient unbiased solution is needed.
[1] Ofir Nachum, Yinlam Chow, and Mohammad Ghavamzadeh. Path consistency learning in Tsallis entropy regularized mdps. arXiv preprint arXiv:1802.03501 , 2018.
[2] Dai, Bo, et al. "Sbeed: Convergent reinforcement learning with nonlinear function approximation." International Conference on Machine Learning. PMLR, 2018.
[3] Erfan Yazdandoost Hamedani and Necdet Serhat Aybat. A primal-dual algorithm for general convex-concave saddle point problems. arXiv preprint arXiv:1803.01401 , 2018

Above Average Decision Making Under Uncertainty

Scope: Master's thesis
Advisor: Tuan Dam, Joni Pajarinen
Start: ASAP
Topic: Google Deepmind recently showed how Monte Carlo Tree Search (MCTS) combined with neural networks can be used to play Go on a super-human level. However, one disadvantage of MCTS is that the search tree explodes exponentially with respect to the planning horizon. In this Master thesis the student will integrate the advantages of MCTS, that is, optimistic decision making into a policy representation that is limited in size with respect to the planning horizon. The outcome will be an approach that can plan further into the future. The application domain will include partially observable problems where decisions can have far reaching consequences.

Approximate Inference Methods for Stochastic Optimal Control

Scope: Master's thesis
Advisor: Joe Watson
Start: ASAP
Topic:

Recent work has presented a control-as-inference formulation that frames optimal control as input estimation. The linear Gaussian assumption can be shown to be equivalent to the LQR solution, while approximate inference through linearization can be viewed as a Gauss–Newton method, similar to popular trajectory optimization methods (e.g. iLQR). However, the linearization approximation limits both the tolerable environment stochasticity and exploration during inference.

The aim of this thesis is to use alternative approximate inference methods (e.g. quadrature, monte carlo, variational), and investigate the benefits to stochastic optimal control and trajectory optimization. Ideally, prospective students are interested in optimal control, approximate inference methods and model-based reinforcement learning.

How can we learn to segment tasks in Humans?

Scope: Master's thesis | Bachelor's thesis
Advisor: Julen Urain
Start: ASAP
Topic:

Object Segmentation algorithms have proved that segmentating data with respect of the information they have is possible. This opens the door to considering time related data like trajectories or videos. Been able to segment the movements of the human with respect of the different actions they are doing will provide a powerful method to undetrstand human tasks, predict them and hopefully mimic it with a robot. In this project it is expected to study different algorithms for Unsupervised segmentation of human actions and study how well the learned models can predict human motion.

Interactive dance: at the interface of high-level reactive robotic control and human-robot interactions.

Scope: Master's thesis
Advisor: Vincent Berenz (a collaborator at Tübingen at the at the Max Planck Institute for Intelligent Systems)
Start: ASAP
Topic:

Robotic scripted dance is common. One the other hand, interactive dance, in which the robot uses runtime sensory information to continuously adapt its moves to those of its (human) partner, remains challenging. It requires integration of together various sensors, action modalities and cognitive processes. The selected candidate objective will be to develop such an interactive dance, based on the software suit for simultaneous perception and motion generation our department built over the years. The target robot on which the dance will be applied is the wheeled robot Softbank Robotics Pepper. This master thesis is with the Max Planck Institute for Intelligent Systems and is located in Tuebingen. More information: https://am.is.tuebingen.mpg.de/jobs/master-thesis-interactive-dance-performed-by-sofbank-robotics-pepper

Minimum knowledge

  • Good Python programming skills.

Preferred knowledge

  • Knowledge of deep neural network, deep recurrent neural networks
  • Basic knowledge of Reinforcement Learning, POMDP, Memory Representation in POMDP
  • Knowledge of recent Deep RL methodologies;

[1] Deep recurrent q-learning for partially observable mdps, Hausknecht et al. https://www.aaai.org/ocs/index.php/FSS/FSS15/paper/download/11673/11503
[2] Learning deep neural network policies with continuous memory states, Zhang et al.https://ieeexplore.ieee.org/iel7/7478842/7487087/07487174.pdf
[3] Recurrent Ladder Networks, Prémont-Schwarz et al. http://papers.nips.cc/paper/7182-recurrent-ladder-networks.pdf

Learning a Friction Hystersis with MOSAIC

Scope: Master's thesis, Bachelor thesis
Advisor: Jan Peters
Start: ASAP
Topic: Inspired by results in neuroscience, especially in the Cerebellum, Kawato & Wolpert introduced the idea of the MOSAIC (modular selection and identification for control) learning architecture. In this architecture, local forward models, i.e., models that predict future states and events, are learned directly from observations. Based on the prediction accuracy of these models, corresponding inverse models can be learned. In this thesis, we want to focus on the problem of learning to control a robot system with a hysteresis in its friction.

Targeted Exploration Using Value Bounds

Scope: Master's thesis
Advisor: Joni Pajarinen
Start: ASAP
Topic: Efficient exploration is one of the most prominent challenges in deep reinforcement learning. In reinforcement learning, exploration of the state space is critical for finding high value actions and connecting them to the causing actions. Exploration in model-free reinforcement learning has relied on classical techniques, empirical uncertainty estimates of the value function, or random policies. In model-based reinforcement learning value bounds have been used successfully to direct exploration. In this Master thesis project the student will investigate how lower and upper value bounds can be used to target exploration in model-free reinforcement learning into the most promising parts of the state space. This thesis topic requires background knowledge in reinforcement learning gained e.g. through machine learning or robot learning courses.

  

zum Seitenanfang