Currently Available Theses Topics

We offer these current topics directly for Bachelor and Master students at TU Darmstadt who can feel free to DIRECTLY contact the thesis advisor if you are interested in one of these topics. Excellent external students from another university may be accepted but are required to first email Jan Peters before contacting any other lab member for a thesis topic. Note that we cannot provide funding for any of these theses projects.

We highly recommend that you do either our robotics and machine learning lectures (Robot Learning, Statistical Machine Learning) or our colleagues (Grundlagen der Robotik, Probabilistic Graphical Models and/or Deep Learning). Even more important to us is that you take both Robot Learning: Integrated Project, Part 1 (Literature Review and Simulation Studies) and Part 2 (Evaluation and Submission to a Conference) before doing a thesis with us.

In addition, we are usually happy to devise new topics on request to suit the abilities of excellent students. Please DIRECTLY contact the thesis advisor if you are interested in one of these topics. When you contact the advisor, it would be nice if you could mention (1) WHY you are interested in the topic (dreams, parts of the problem, etc), and (2) WHAT makes you special for the projects (e.g., class work, project experience, special programming or math skills, prior work, etc.). Supplementary materials (CV, grades, etc) are highly appreciated. Of course, such materials are not mandatory but they help the advisor to see whether the topic is too easy, just about right or too hard for you.

Only contact *ONE* potential advisor at the same time! If you contact a second one without first concluding discussions with the first advisor (i.e., decide for or against the thesis with her or him), we may not consider you at all. Only if you are super excited for at most two topics send an email to both supervisors, so that the supervisors are aware of the additional interest.

FOR FB16+FB18 STUDENTS: Students from other depts at TU Darmstadt (e.g., ME, EE, IST), you need an additional formal supervisor who officially issues the topic. Please do not try to arrange your home dept advisor by yourself but let the supervising IAS member get in touch with that person instead. Multiple professors from other depts have complained that they were asked to co-supervise before getting contacted by our advising lab member.

NEW THESES START HERE

Data-Driven Bimanual Robotic Grasping

Scope: Bachelor/Master thesis
Advisor: Vignesh Prasad and Alap Kshirsagar
Added: 2024-04-25
Start: ASAP
Topic: Topic:

Grasping is one of the most fundamental and challenging tasks in the robotic manipulation of objects. Most of the prior work on robotic grasping has focused on grasping with a single gripper and several large-scale datasets have been developed in recent years to tackle the problem of single-arm grasping in 3D by utilizing deep-learning techniques [1,2]. But many tasks in industrial and domestic environments require bimanual grasps. Bimanual grasps are required for manipulation of large, deformable or fragile objects. This project seeks to develop a data-driven technique for bimanual robotic grasp generation from visual input. We will utilize a large-scale dataset of simulated bimanual grasps [3] to train a bimanual grasp pose generation model. The method will be evaluated in simulation as well as on a real robot.

Requirements

  • Strong Python programming skills
  • Knowledge in Machine Learning / Supervised Learning
  • Experience with deep learning libraries is a plus

Interested students can apply by sending an e-mail to alap.kshirsagar@ias.tu-darmstadt.de and attaching the documents mentioned below:

  • Curriculum Vitae
  • Motivation letter explaining why you would like to work on this topic and why you are the perfect candidate

References
[1] C. Eppner, A. Mousavian, and D. Fox, “ACRONYM: A Large-Scale Grasp Dataset Based on Simulation,” in Proceedings - IEEE International Conference on Robotics and Automation, 2021, vol. 2021-May, pp. 6222–6227, doi: 10.1109/ICRA48506.2021.9560844.
[2] A. Mousavian, C. Eppner, and Di. Fox, “6-DOF GraspNet: Variational grasp generation for object manipulation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, vol. 2019-Octob, pp. 2901–2910, doi: 10.1109/ICCV.2019.00299.
[3] G. Zhai et al., “{DA2} Dataset: Toward Dexterity-Aware Dual-Arm Grasping,” IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 8941–8948, 2022.

Imitation Learning for High-Speed Robot Air Hockey

Scope: Master thesis
Advisor: Puze Liu and Julen Urain De Jesus
Start: ASAP
Topic:

High-speed reactive motion is one of the fundamental capabilities of robots to achieve human-level behavior. Optimization-based methods suffer from real-time requirement when the problem is non-convex and contains constraints. Reinforcement learning requires extensive reward engineering to achieve the desired performance. Imitation learning, on the other hand, gathers human knowledge directly from data collection and enables robots to learn natural movements efficiently. In this paper, we explore how imitation learning can be performed in a complex robot Air Hockey Task. The robot needs to learn not only low-level skills, but also high-level tactics from human demonstrations.

Requirements

  • Strong Python programming skills
  • Knowledge in Machine Learning / Supervised Learning
  • Good Knowledge in Robotics
  • Experience with deep learning libraries is a plus

References
* Chi, Cheng, et al. "Diffusion policy: Visuomotor policy learning via action diffusion." arXiv preprint arXiv:2303.04137 (2023).
* Liu, Puze, et al. "Robot reinforcement learning on the constraint manifold." Conference on Robot Learning. PMLR (2022).
* Pan, Yunpeng, et al. "Imitation learning for agile autonomous driving." The International Journal of Robotics Research 39.2-3 (2020). Interested students can apply by sending an e-mail to puze.liu@ias.tu-darmstadt.de and attaching the required documents mentioned above.

Walk your network: investigating neural network’s location in Q-learning methods.

Scope: Master thesis
Advisor: Theo Vincent
Start: Flexible
Topic:

Q-learning methods are at the heart of Reinforcement Learning. They have been shown to outperform humans on some complex tasks such as playing video games [1]. In robotics, where the action space is in most cases continuous, actor-critic methods are relying on Q-learning methods to learn the critic [2]. Although Q-learning methods have been extensively studied in the past, little focus has been placed on the way the online neural network is exploring the space of Q functions. Most approaches focus on crafting a loss that would make the agent learn better policies [3]. Here, we offer a thesis that focuses on the position of the online Q neural network in the space of Q functions. The student will first investigate this idea on simple problems before comparing the performance to strong baselines such as DQN or REM [1, 4] on Atari games. Depending on the result, the student might as well get into MuJoCo and compare the results with SAC [2]. The student will be welcome to propose some ideas as well.

Highly motivated students can apply by sending an email to theovincentjourdat@gmail.com. Please attach your CV, a grade sheet and clearly state why you are interested in this topic. Students who have followed the Reinforcement Learning or Robot Learning course will be prioritized.

Requirements

  • Strong Python programming skills
  • Knowledge in Reinforcement Learning
  • Experience with deep learning libraries is a plus

References
[1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533.
[2] Haarnoja, Tuomas, et al. "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor." International conference on machine learning. PMLR, 2018.
[3] Hessel, Matteo, et al. "Rainbow: Combining improvements in deep reinforcement learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 32. No. 1. 2018.
[4] Agarwal, R., Schuurmans, D. & Norouzi, M.. (2020). An Optimistic Perspective on Offline Reinforcement Learning International Conference on Machine Learning (ICML).

Co-optimizing Hand and Action for Robotic Grasping of Deformable objects

Scope: Master thesis
Advisor: Alap Kshirsagar, Boris Belousov, Guillaume Duret
Added: 2024-01-15
Start: ASAP
Topic: The current standard approach to robotic manipulation involves distinct stages of manipulator design and control. However, the interdependence of a robot gripper's morphology and control suggests that jointly optimizing these aspects can significantly enhance performance. Existing methods for such a co-optimization [1] are limited to rigid objects whereas manipulation of deformable objects is critical for several real-world applications such as food handling and robotic surgery.

This project aims to advance deformable object manipulation by co-optimizing robot gripper morphology and control policies. The project will involve utilizing existing simulation environments for deformable object manipulation [2] and implementing a method to jointly optimize gripper morphology and grasp policies within the simulation.

Required Qualification:

  • Strong Python programming skills
  • Familiarity with deep learning libraries such as PyTorch or Tensorflow

Preferred Qualification:

  • Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and "Robot Learning"

Application Requirements:

  • Curriculum Vitae
  • Motivation letter explaining why you would like to work on this topic and why you are the perfect candidate

Interested students can apply by sending an e-mail to alap.kshirsagar@tu-darmstadt.de and attaching the required documents mentioned above.

References:
[1] Xu, Jie, et al. "An End-to-End Differentiable Framework for Contact-Aware Robot Design." Robotics: Science & Systems. 2021.
[2] Huang, Isabella, et al. "DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets." arXiv preprint arXiv:2303.16138 (2023).

Geometry-Aware Diffusion Models for Robotics

Scope: Master thesis
Advisor: Joao Carvalho, An Thai Le
Added: 2023-11-20
Start: ASAP
Topic: Diffusion-based generative models have shown increasing performance in image generation [1], and, more recently, in reinforcement learning and robotics. E.g., in motion planning [2], grasping [3], imitation learning [4], and offline reinforcement learning [5]. Two important properties of diffusion models that we are interested in exploring are their ability to encode multimodal demonstrations and combine gradients of multiple costs (as in optimization-based motion planning).

In this thesis, you will work on developing an imitation learning algorithm using diffusion models for robotic manipulation tasks, such as the ones in [2, 3, 4], but taking into account the geometry of the task space.

If this sounds interesting, please send an email to joao@robot-learning.de and an@robot-learning.de, and possibly attach your CV, highlighting the relevant courses you took in robotics and machine learning.

What's in it for you:

  • You get to work on an exciting topic at the intersection of deep-learning and robotics
  • We will supervise you closely throughout your thesis
  • Depending on the results, we will aim for an international conference publication

Requirements:

  • Be motivated -- we will support you a lot, but we expect you to contribute a lot too
  • Robotics knowledge
  • Experience setting up deep learning pipelines -- from data collection, architecture design, training, and evaluation
  • PyTorch -- especially experience writing good parallelizable code (i.e., runs fast in the GPU)

References:
[1] https://arxiv.org/abs/2112.10752
[2] https://arxiv.org/abs/2308.01557
[3] https://arxiv.org/abs/2209.03855
[4] https://arxiv.org/abs/2303.04137
[5] https://arxiv.org/abs/2205.09991

Learning Latent Representations for Embodied Agents

Scope: Master Thesis
Advisor: Michael Drolet, Oleg Arenz
Added: 2023-11-05
Start: ASAP
Topic: Learning from Demonstration [1] and Policy Search [2] are fundamental approaches in training robot policies, but there's a critical challenge: choosing the right expert. Typically, experts are expected to operate within the same state space and dynamics as the robot to facilitate learning within a specific environment. While this approach can yield desirable results, it limits us to a single expert, often overlooking valuable insights from experts with different dynamics and actuators. This is where your thesis steps in.. Our goal is harness the collective power of experts with diverse embodiments. We seek to create a more comprehensive representation of prior experiences by identifying common attributes in a lower-dimensional latent space. You will play a crucial role in discovering efficient encoding and decoding techniques for this latent space, ensuring it can be applied to various robot platforms. Your work will involve mastering dimensionality reduction, imitation learning, and the transfer of robot skills. While prior experience in these areas is a bonus, we welcome anyone with a passion for robotics and sufficient qualifications as described below.

Interested students can apply by sending an E-Mail to michael.drolet@tu-darmstadt.de and attaching the required documents mentioned below.

Required Qualification:

  • Strong Python programming skills
  • Experience with TensorFlow/PyTorch
  • Familiarity with core Machine Learning topics

Preferred Qualification:

  • Experience programming/controlling robots (either simulated or real world)
  • Knowledgeable about different robot platforms (quadrupeds and bipedal robots)

Application Requirements:

  • Resume / CV
  • Cover letter explaining why this topic fits you well and why you are an ideal candidate

References:
[1] Ho and Ermon. "Generative adversarial imitation learning"
[2] Arenz, et al. "Efficient Gradient-Free Variational Inference using Policy Search"

Characterizing Fear-induced Adaptation of Balance by Inverse Reinforcement Learning

Scope: Bachelor or Master thesis
Advisor: Alap Kshirsagar
Added: 2023-10-30
Start: ASAP
Topic: Fear of falling has been found to correlate with increased postural sway, i.e. the oscillations of the center of pressure (CoP) that we experience even while standing quietly [1]. Attempts at a deeper understanding of this correlation have tied behavioral to neurophysiological changes [2], but a rigorous computational explanation for the behavioral and neurophysiological changes is currently lacking. In this project, we seek to understand how do the computational goals underlying human balance control change when standing at a height. To tackle this challenge, we utilize inverse reinforcement learning (IRL) which tries to recover the agent's objective function from observed behavior. The thesis will involve conducting VR-based human participant experiments in collaboration with cognitive scientists from Phillipps University Marburg, implementing an IRL algorithm (for example, GAIL [3]), and evaluating the IRL algorithm on data obtained from the human participant experiments.

Interested students can apply by sending an E-Mail to alap.kshirsagar@tu-darmstadt.de and attaching the required documents mentioned below.

Required Qualification:

  • Strong Python programming skills
  • Basic knowledge of reinforcement learning

Preferred Qualification:

  • Hand-on experience with reinforcement learning or inverse reinforcement learning
  • Cognitive science background

Application Requirements:

  • Curriculum Vitae
  • Motivation letter explaining why you would like to work on this topic and why you are the perfect candidate

References:
[1] Maki, et al. "Fear of Falling and Postural Performance in the Elderly"
[2] Davis et al. "The relationship between fear of falling and human postural control"
[3] Ho and Ermon. "Generative adversarial imitation learning"

Timing is Key: CPGs for regularizing Quadruped Gaits learned with DRL

Scope: Master thesis
Advisor: Nico Bohlinger, Davide Tateo
Added: 2023-10-20
Start: ASAP
Topic: Current model-free Deep Reinforcement Learning (DRL) approaches for quadruped locomotion can learn highly agile locomotion (like parkour [1]), while being far more flexible than model-based or optimal control approaches. But to achieve a natural-looking gait they rely on complex reward functions with up to 12 or more reward terms and heavy tuning of their coefficients. This tuning is quite fragile and often has to be redone for different robots or new environments.

To tackle this problem we want to utilize Central Pattern Generators (CPGs), which can generate timings for ground contacts for the four feet. The policy gets rewarded for complying with the contact patterns of the CPGs. This leads to a straightforward way of regularizing and steering the policy to a natural gait without posing too strong restrictions on it. We first want to manually find fitting CPG parameters for different gait velocities and later move to learning those parameters in an end-to-end fashion.

Highly motivated students can apply by sending an E-Mail to nico.bohlinger@tu-darmstadt.de and attaching the required documents mentioned below.

Minimum Qualification:

  • Good Python programming skills
  • Basic knowledge of the PyTorch library
  • Basic knowledge of Reinforcement Learning

Preferred Qualification:

  • Good knowledge of the PyTorch library
  • Basic knowledge of the MuJoCo simulator

Application Requirements:

  • Curriculum Vitae
  • Motivation letter explaining why you would like to work on this topic and why you are the perfect candidate

References:
[1] Cheng, Xuxin, et al. "Extreme Parkour with Legged Robots."

Damage-aware Reinforcement Learning for Deformable and Fragile Objects

Scope: Master thesis
Advisor: Guillaume Duret, Tim Schneider
Added: 2023-10-16
Start: ASAP
Topic: Dealing with soft or fragile objects introduces a host of challenges that surpass traditional rigid object manipulation, such as deformability and the risk of damage. Tasks in which we have to deal with such objects include, for example, squeezing a mustard bottle or picking up fragile fruits without causing damage. In this thesis we will tackle this problem with model-based reinforcement learning by using existing models to predict stress and deformability of objects.

Goal of this thesis will be the development and application of a model-based reinforcement learning method on real robots. Your tasks will include:
1. Setting up a simulation environment for deformable object manipulation
2. Utilizing existing models for stress and deformability prediction[1]
3. Implementing a reinforcement learning method to work in simulation and, if possible, on the real robot methods.

If you are interested in this thesis topic and believe you possess the necessary skills and qualifications, please submit your application, including a resume and a brief motivation letter explaining your interest and relevant experience. Please send your application to guillaume.duret@ec-lyon.fr.

Required Qualification:

  • Enthusiasm for and experience in robotics, machine learning, and simulation
  • Strong programming skills in Python
  • Familiarity with deep learning libraries such as PyTorch or Tensorflow

Desired Qualification:

  • Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and (optionally) "Robot Learning"

References:
[1] Huang, I., Narang, Y., Bajcsy, R., Ramos, F., Hermans, T., & Fox, D. (2023). DefGraspNets: Grasp Planning on 3D Fields with Graph Neural Nets. arXiv preprint arXiv:2303.16138.

Imitation Learning meets Diffusion Models for Robotics

Scope: Master thesis
Advisor: Julen Urain De Jesus, Firas Al-Hafez
Added: 2023-10-19
Start: ASAP
Topic: The use of Diffusion Models for generating high-resolution images based on text has yielded impressive results [1]. These models enable the generation of novel images that correspond to a given prompt. Despite a few previous attempts [2], the application of Diffusion Models in Robotics has been under-explored.

The objective of this thesis is to build upon prior research [2, 3] to establish a connection between Diffusion Models and Imitation Learning. We aim to explore how to exploit Diffusion Models and improve the performance of Imitation learning algorithms that interact with the world.

We welcome highly motivated students to apply for this opportunity by sending an email expressing their interest to Firas Al-Hafez ( firas.al-hafez@tu-darmstadt.de) Julen Urain ( urain@ias.informatik.tu-darmstadt.de). Please attach your letter of motivation and CV, and clearly state why you are interested in this topic and why you are the ideal candidate for this position.

Required Qualification:
1. Strong Python programming skills
2. Basic Knowledge in Imitation Learning
3. Interest in Diffusion models, Reinforcement Learning

Desired Qualification:
1. Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and/or "Reinforcement Learning: From Fundamentals to the Deep Approaches"

References:
[1] Song, Yang, and Stefano Ermon. "Generative modeling by estimating gradients of the data distribution." Advances in neural information processing systems 32 (2019).
[2] Ho, Jonathan, and Stefano Ermon. "Generative adversarial imitation learning." Advances in neural information processing systems 29 (2016).
[3] Garg, D., Chakraborty, S., Cundy, C., Song, J., & Ermon, S. (2021). Iq-learn: Inverse soft-q learning for imitation. Advances in Neural Information Processing Systems, 34, 4028-4039.
[4] Chen, R. T., & Lipman, Y. (2023). Riemannian flow matching on general geometries. arXiv preprint arXiv:2302.03660.

Geometry-Aware Diffusion Models for Robotics

Scope: Master thesis
Advisor: Joao Carvalho, An Thai Le
Added: 2023-10-09
Start: ASAP
Topic: Diffusion-based generative models have shown increasing performance in image generation [1], and, more recently, in reinforcement learning and robotics. E.g., in motion planning [2], grasping [3], imitation learning [4], and offline reinforcement learning [5]. Two important properties of diffusion models that we are interested in exploring are their ability to encode multimodal demonstrations and combine gradients of multiple costs (as in optimization-based motion planning).

In this thesis, you will work on developing an imitation learning algorithm using diffusion models for robotic manipulation tasks, such as the ones in [2, 3, 4], but taking into account the geometry of the task space.

If this sounds interesting, please send an email to joao@robot-learning.de and an@robot-learning.de, and possibly attach your CV, highlighting the relevant courses you took in robotics and machine learning.

What's in it for you:

  • You get to work on an exciting topic at the intersection of deep-learning and robotics
  • We will supervise you closely throughout your thesis
  • Depending on the results, we will aim for an international conference publication

Requirements:

  • Be extremely motivated -- we will support you a lot, but we expect you to contribute a lot too
  • Robotics knowledge
  • Experience setting up deep learning pipelines -- from data collection, architecture design, training, and evaluation
  • PyTorch -- especially experience writing good parallelizable code (i.e., runs fast in the GPU)

References:
[1] https://arxiv.org/abs/2112.10752
[2] https://arxiv.org/abs/2308.01557
[3] https://arxiv.org/abs/2209.03855
[4] https://arxiv.org/abs/2303.04137
[5] https://arxiv.org/abs/2205.09991

Scaling Behavior Cloning to Humanoid Locomotion

Scope: Bachelor / Master thesis
Advisor: Joe Watson
Added: 2023-10-07
Start: ASAP
Topic: In a previous project [1], I found that behavior cloning (BC) was a surprisingly poor baseline for imitating humanoid locomotion. I suspect the issue may lie in the challenges of regularizing high-dimensional regression.

The goal of this project is to investigate BC for humanoid imitation, understand the scaling issues present, and evaluate possible solutions, e.g. regularization strategies from the regression literature.

The project will be building off Google Deepmind's Acme library [2], which has BC algorithms and humanoid demonstration datasets [3] already implemented, and will serve as the foundation of the project.

To apply, email joe@robot-learning.de, ideally with a CV and transcript so I can assess your suitability.

Requirements:

  • Experience, interest and enthusiasm for the intersection of robot learning and machine learning
  • Experience with Acme and JAX would be a benefit, but not necessary

References:
[1] https://arxiv.org/abs/2305.16498
[2] https://github.com/google-deepmind/acme
[3] https://arxiv.org/abs/2106.00672

Robot Gaze for Communicating Collision Avoidance Intent in Shared Workspaces

Scope: Bachelor/Master thesis
Advisor: Alap Kshirsagar, Dorothea Koert
Added: 2023-09-27
Start: ASAP

Topic: In order to operate close to non-experts, future robots require both an intuitive form of instruction accessible to lay users and the ability to react appropriately to a human co-worker. Instruction by imitation learning with probabilistic movement primitives (ProMPs) [1] allows capturing tasks by learning robot trajectories from demonstrations including the motion variability. However, appropriate responses to human co-workers during the execution of the learned movements are crucial for fluent task execution, perceived safety, and subjective comfort. To facilitate such appropriate responsive behaviors in human-robot interaction, the robot needs to be able to react to its human workspace co-inhabitant online during the execution. Also, the robot needs to communicate its motion intent to the human through non-verbal gestures such as eye and head gazes [2][3]. In particular for humanoid robots, combining motions of arms with expressive head and gaze directions is a promising approach that has not yet been extensively studied in related work.

Goals of the thesis:

  • Develop a method to combine robot head/gaze motion with ProMPs for online collision avoidance
  • Implement the method on a Franka-Emika Panda Robot
  • Evaluate and compare the implemented behaviors in a study with human participants

Highly motivated students can apply by sending an email to alap.kshirsagar@tu-darmstadt.de. Please attach your CV and transcript, and clearly state your prior experiences and why you are interested in this topic.

Required Qualification:

  • Strong Programming Skills in python
  • Prior experience with Robot Operating System (ROS) and user studies would be beneficial
  • Strong motivation for human-centered robotics including design and implementation of a user study

References:
[1] Koert, Dorothea, et al. "Learning intention aware online adaptation of movement primitives." IEEE Robotics and Automation Letters 4.4 (2019): 3719-3726.
[2] Admoni, Henny, and Brian Scassellati. "Social eye gaze in human-robot interaction: a review." Journal of Human-Robot Interaction 6.1 (2017): 25-63.
[3] Lemasurier, Gregory, et al. "Methods for expressing robot intent for human–robot collaboration in shared workspaces." ACM Transactions on Human-Robot Interaction (THRI) 10.4 (2021): 1-27.

Tactile Sensing for the Real World

Scope: Master thesis
Advisor: Theo Gruner, Daniel Palenicek, and Tim Schneider
Start: ASAP

Topic: Tactile sensing is a crucial sensing modality that allows humans to perform dexterous manipulation[1]. In recent years, the development of artificial tactile sensors has made substantial progress, with current models relying on cameras inside the fingertips to extract information about the points of contact [2]. However, robotic tactile sensing is still a largely unsolved topic despite these developments. A central challenge of tactile sensing is the extraction of usable representations of sensor readings, especially since these generally contain an incomplete view of the environment.

Recent model-based reinforcement learning methods like Dreamer [3] leverage latent state-space models to reason about the environment from partial and noisy observations. However, more work has yet to be done to apply such methods to real-world manipulation tasks. Hence, this thesis will explore whether Dreamer can solve challenging real-world manipulation tasks by leveraging tactile information. Initial results suggest that tasks like peg-in-a-hole can indeed be solved with Dreamer in simulation (see figure above), but the applicability of this method in the real world has yet to be shown.

In this work, you will work with state-of-the-art hardware and compute resources on a hot research topic with the option of publishing your work at a scientific conference.

Highly motivated students can apply by sending an email to theo_sunao.gruner@tu-darmstadt.de. Please attach a transcript of records and clearly state your prior experiences and why you are interested in this topic.

Requirements

  • Strong Python programming skills
  • Ideally experience with deep learning libraries like JAX or PyTorch
  • Experience with reinforcement learning is a plus
  • Experience with Linux

References
[1] 2S Match Anest2, Roland Johansson Lab (2005), https://www.youtube.com/watch?v=HH6QD0MgqDQ
[2] Gelsight Inc., Gelsight Mini, https://www.gelsight.com/gelsightmini/
[3] Hafner, D., Lillicrap, T., Ba, J., & Norouzi, M. (2019). Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603.

Large Vision-Language Neural Networks for Open-Vocabulary Robotic Manipulation

Scope: Master's thesis
Advisor: Snehal Jauhri, Ali Younes
Topic:

Robots are expected to soon leave their factory/laboratory enclosures and operate autonomously in everyday unstructured environments such as households. Semantic information is especially important when considering real-world robotic applications where the robot needs to re-arrange objects as per a set of language instructions or human inputs (as shown in the figure). Many sophisticated semantic segmentation networks exist [1]. However, a challenge when using such methods in the real world is that the semantic classes rarely align perfectly with the language input received by the robot. For instance, a human language instruction might request a ‘glass’ or ‘water’, but the semantic classes detected might be ‘cup’ or ‘drink’.

Nevertheless, with the rise of large language and vision-language models, we now have capable segmentation models that do not directly predict semantic classes but use learned associations between language queries and classes to give us ’open-vocabulary’ segmentation [2]. Some models are especially powerful since they can be used with arbitrary language queries.

In this thesis, we aim to build on advances in 3D vision-based robot manipulation and large open-vocabulary vision models [2] to build a full pick-and-place pipeline for real-world manipulation. We also aim to find synergies between scene reconstruction and semantic segmentation to determine if knowing the object semantics can aid the reconstruction of the objects and, in turn, aid manipulation.

Highly motivated students can apply by sending an e-mail expressing their interest to Snehal Jauhri (email: snehal.jauhri@tu-darmstadt.de) or Ali Younes (email: ali.younes@tu-darmstadt.de), attaching your letter of motivation and possibly your CV.

Topic in detail: Thesis_Doc.pdf

Requirements:
Enthusiasm, ambition, and a curious mind go a long way. There will be ample supervision provided to help the student understand basic as well as advanced concepts. However, prior knowledge of computer vision, robotics, and Python programming would be a plus.

References:
[1] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2”, https://github.com/facebookresearch/detectron2, 2019.
[2] F. Liang, B. Wu, X. Dai, K. Li, Y. Zhao, H. Zhang, P. Zhang, P. Vajda, and D. Marculescu, “Open-vocabulary semantic segmentation with mask-adapted clip,” in CVPR, 2023, pp. 7061–7070, https://github.com/facebookresearch/ov-seg

Dynamic Tiles for Deep Reinforcement Learning

Scope: Master's thesis
Advisor: Davide Tateo, Carlo D'Eramo
Added: 2023-06-27
Start: 2023-09-18
Topic:

Linear approximators in Reinforcement Learning are well-studied and come with an in-depth theoretical analysis. However, linear methods require defining a set of features of the state to be used by the linear approximation. Unfortunately, the feature construction process is a particularly problematic and challenging task. Deep Reinforcement learning methods have been introduced to mitigate the feature construction problem: these methods do not require handcrafted features, as features are extracted automatically by the network during learning, using gradient descent techniques.

In simple reinforcement learning tasks, however, it is possible to use tile coding as features: Tiles are simply a convenient discretization of the state space that allows us to easily control the generalization capabilities of the linear approximator. The objective of this thesis is to design a novel algorithm for automatic feature extraction that generates a set of features similar to tile coding, but that can arbitrarily partition the state space and deal with arbitrary complex state space, such as images. The idea is to combine the feature extraction problem directly with Linear Reinforcement Learning methods, defining an algorithm that is able both to have the theoretical guarantees and good convergence properties of these methods and the flexibility of Deep Learning approaches.

Requirements

  • Curriculum Vitae (CV);
  • A motivation letter explaining the reason for applying for this thesis and academic/career objectives.

Minimum knowledge

  • Good Python programming skills;
  • Basic knowledge of Reinforcement Learning.

Preferred knowledge

  • Knowledge of the PyTorch library;
  • Knowledge of the Atari environments (ale-py library).
  • Knowledge of the MushroomRL library.

Accepted candidate will

  • Define a generalization of tile coding working with an arbitrary input set (including images);
  • Design a learning algorithm to adapt the tiles using data of interaction with the environment;
  • Combine feature learning with standard linear methods for Reinforcement Learning;
  • Verify the novel methodology in simple continuous state and discrete actions environments;
  • (Optionally) Extend the experimental analysis to the Atari environment setting.

Deep Learning Meets Teleoperation: Constructing Learnable and Stable Inductive Guidance for Shared Control

Scope: Master thesis
Advisor: Kay Hansel, An Thai Le
Added: 2023-06-14
Start: July / August 2023
Topic: Teleoperation is one of the biggest challenges in robotics [1]. It allows us to bypass human physical limitations and more rapidly and efficiently distribute skills and hands-on expertise to distant geographic locations where they are needed. However, transferring skills and hands-on expertise to robots in remote, sometimes dangerous environments must meet specific requirements, e.g., high precision and safety. Additional difficulties, such as communication delays and partial observability, complicate the transfer. Therefore, prior work introduced assistive policies that guide the user while execution, also known as shared control [2]. However, most of these policies are task-specific, manually created, or lack properties such as stability.

This work considers policies as learnable inductive guidance for shared control. In particular, we use the class of Riemannian motion policies [3] and consider them as differentiable optimization layers [4]. We analyze (i) if RMPs can be pre-trained by learning from demonstrations [5] or reinforcement learning [6] given a specific context; (ii) and subsequently employed seamlessly for human-guided teleoperation thanks to their physically consistent properties, such as stability [3]. We believe this step eliminates the laborious process of constructing complex policies and leads to improved and generalizable shared control architectures.

Highly motivated students can apply by sending an e-mail expressing your interest to kay.hansel@tu-darmstadt.de and an.le@tu-darmstadt.de, attaching your letter of motivation and possibly your CV.

Requirements:

  • Strong Python programming skills
  • Experience with deep learning libraries (in particular Pytorch)
  • Knowledge in reinforcement learning and/or machine learning

References:
[1] Niemeyer, Günter, et al. "Telerobotics." Springer handbook of robotics (2016);
[2] Selvaggio, Mario, et al. "Autonomy in physical human-robot interaction: A brief survey." IEEE RAL (2021);
[3] Cheng, Ching-An, et al. "RMP flow: A Computational Graph for Automatic Motion Policy Generation." Springer (2020);
[4] Jaquier, Noémie, et al. "Learning to sequence and blend robot skills via differentiable optimization." IEEE RAL (2022);
[5] Mukadam, Mustafa, et al. "Riemannian motion policy fusion through learnable lyapunov function reshaping." CoRL (2020);
[6] Xie, Mandy, et al. "Neural geometric fabrics: Efficiently learning high-dimensional policies from demonstration." CoRL (2023).

Dynamic symphony: Seamless human-robot collaboration through hierarchical policy blending

Scope: Master thesis
Advisor: Kay Hansel, Berk Gueler
Added: 2023-06-14
Start: July / August 2023
Topic: Teleoperation is one of the biggest challenges in robotics [1]. It allows us to bypass human physical limitations and more rapidly and efficiently distribute skills and hands-on expertise to distant geographic locations where they are needed. However, transferring skills and hands-on expertise to robots in remote, sometimes dangerous environments must meet specific requirements, e.g., high precision and safety. Additional difficulties, such as communication delays and partial observability, complicate the transfer. Therefore, previous work introduced assistive policies that guide the user during execution, also known as shared control, and studied how to arbitrate between user and autonomy [2].

This work focuses on arbitration between the user and assistive policy, i.e., shared autonomy. Various works allow the user to influence the dynamic behavior explicitly and, therefore, could not satisfy stability guarantees [3]. We pursue the idea of formulating arbitration as a trajectory-tracking problem that implicitly considers the user's desired behavior as an objective [4]. Therefore, we extend the work of Hansel et al. [5], who employed probabilistic inference for policy blending in robot motion control. The proposed method corresponds to a sampling-based online planner that superposes reactive policies given a predefined objective. This method enables the user to implicitly influence the behavior without injecting energy into the system, thus satisfying stability properties. We believe this step leads to an alternative view of shared autonomy with an improved and generalizable framework.

Highly motivated students can apply by sending an e-mail expressing your interest to kay.hansel@tu-darmstadt.de or berk.gueler@tu-darmstadt.de, attaching your letter of motivation and possibly your CV.

Requirements:

  • Strong Python programming skills
  • Experience with deep learning libraries (in particular Pytorch)
  • Knowledge in reinforcement learning and/or machine learning

References:
[1] Niemeyer, Günter, et al. "Telerobotics." Springer handbook of robotics (2016);
[2] Selvaggio, Mario, et al. "Autonomy in physical human-robot interaction: A brief survey." IEEE RAL (2021);
[3] Dragan, Anca D., and Siddhartha S. Srinivasa. "A policy-blending formalism for shared control." IJRR (2013);
[4] Javdani, Shervin, et al. "Shared autonomy via hindsight optimization for teleoperation and teaming." IJRR (2018);
[5] Hansel, Kay, et al. "Hierarchical Policy Blending as Inference for Reactive Robot Control." IEEE ICRA (2023).

Feeling the Heat: Igniting Matches via Tactile Sensing and Human Demonstrations

Scope: Master thesis
Advisor: Tim Schneider Niklas Funk
Added: 2023-03-27
Start: April / May 2023
Topic: Humans heavily rely on tactile sensing for a range of dynamic tasks, such as igniting matches, catching balls, and shuffling cards. Yet, there still only exist few works that augment robotic systems with a sense of touch. In this thesis we want to employ tactile sensors for solving the highly dynamic task of igniting matches. This task has been of great interest to understand the importance of tactile sensing in humans [1] which makes it a great candidate for studying robot tactile sensing.

In this thesis, we want to investigate the effectiveness of vision-based tactile sensors for solving dynamic tasks (igniting matches). Since the whole task is difficult to simulate, we directly collect real-world data to learn policies from the human demonstrations [2,3]. We believe that this work is an important step towards more advanced tactile skills.

Highly motivated students can apply by sending an e-mail expressing your interest to niklas.funk@tu-darmstadt.de and tim.schneider1@tu-darmstadt.de, attaching your letter of motivation and possibly your CV.

Requirements:

  • Good knowledge of Python
  • Experience with deep learning libraries (in particular Pytorch)
  • Prior experience with real robots and Linux is a plus

References:
[1] https://www.youtube.com/watch?v=HH6QD0MgqDQ
[2] Learning Compliant Manipulation through Kinesthetic and Tactile Human-Robot Interaction; Klas Kronander and Aude Billard.
[3] https://www.youtube.com/watch?v=jAtNvfPrKH8

Inverse Reinforcement Learning for Neuromuscular Control of Humanoids

Scope: Master thesis
Advisor: Firas Al-Hafez, Davide Tateo
Added: February 6, 2023
Start: ASAP
Topic: Reinforcement Learning (RL) recently achieved remarkable success on locomotion tasks such as for quadrupeds and humanoids. Despite the success, approaches building on RL usually require huge effort and expert knowledge to define the reward function to get a smooth and natural-looking gait. In contrast, Inverse Reinforcement Learning (IRL) infers a reward function given a set of expert demonstrations. While it is easy to get a set of observations from motion capture, the underlying actions of a human are unobservable. Moreover, muscular systems are overactuated as there are generally a set of muscles acting on the same joint. This overactuation hinders common exploration strategies like Gaussian noise from efficiently exploring the state and action space.

Within this thesis, the problems of learning from observations and efficient exploration in overactued systems should be addressed. Regarding the former, novel methods incorporating inverse dynamics models into the inverse reinforcement learning problem [1] should be adapted and applied. To address the problem of efficient exploration in overactuted systems, two approaches should be implemented and compared. The first approach uses a handcrafted action space, which disables and modulates actions in different phases of the gait based on biomechanics knowledge [2]. The second approach uses a stateful policy to incorporate an inductive bias into the policy [3]. The thesis will be supervised in conjunction with Guoping Zhao ( guoping.zhao@tu-darmstadt.de) from the locomotion lab.

Highly motivated students can apply by sending an e-mail expressing their interest to Firas Al-Hafez ( firas.al-hafez@tu-darmstadt.de), attaching your letter of motivation and possibly your CV. Try to make clear why you would like to work on this topic, and why you would be the perfect candidate for the latter.

Required Qualification:
1. Strong Python programming skills
2. Knowledge in Reinforcement Learning
3. Interest in understanding human locomotion

Desired Qualification:
1. Hands-on experience on robotics-related RL projects
2. Prior experience with different simulators
3. Attendance of the lectures "Statistical Machine Learning", "Computational Engineering and Robotics" and/or "Reinforcement Learning: From Fundamentals to the Deep Approaches"

References:
[1] Al-Hafez, F.; Tateo, D.; Arenz, O.; Zhao, G.; Peters, J. (2023). LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning, International Conference on Learning Representations (ICLR).
[2] Ong CF; Geijtenbeek T.; Hicks JL; Delp SL (2019) Predicting gait adaptations due to ankle plantarflexor muscle weakness and contracture using physics-based musculoskeletal simulations. PLoS Computational Biology
[3] Srouji, M.; Zhang, J:;Salakhutdinow, R. (2018) Structured Control Nets for Deep Reinforcement Learning, International Conference on Machine Learning (ICML)

Robotic Tactile Exploratory Procedures for Identifying Object Properties

Scope: Master's thesis
Advisor: Tim Schneider, Alap Kshirsagar
Added: 2022-09-06
Start: ASAP
Topic: Identifying properties such as shape, deformability, roughness etc. is important for successful manipulation of an object. Humans use specific “exploratory procedures (EPs)” [1] to identify object properties, for example, lateral motion to detect texture and pressure to detect deformability. Our goal is to understand whether these exploratory procedures are optimal for robotic arms equipped with tactile sensors. We specifically focus on three properties and their corresponding EPs: texture (lateral motion), shape (contour following) and deformability (pressure).

Goals of the thesis

  • Literature review of robotic EPs for identifying object properties [2,3,4]
  • Develop and implement robotic EPs for a Digit tactile sensor
  • Compare performance of robotic EPs with human EPs

Desired Qualifications

  • Interested in working with real robotic systems
  • Python programming skills

Literature
[1] Lederman and Klatzky, “Haptic perception: a tutorial”
[2] Seminara et al., “Active Haptic Perception in Robots: A Review”
[3] Chu et al., “Using robotic exploratory procedures to learn the meaning of haptic adjectives”
[4] Kerzel et al., “Neuro-Robotic Haptic Object Classification by Active Exploration on a Novel Dataset”

Scaling learned, graph-based assembly policies

Scope: Master thesis
Advisor: Niklas Funk
Added: 2022-08-04
Start: ASAP
Topic: Solving assembly tasks with a large number of building blocks and arbitrary desired designs (similar to playing LEGO) recently gathered lots of interest [1-3], yet, it remains a challenging task for machine learning algorithms.
The goal of this thesis would be to build up on our very recent work on assembly [1,2] and to scale the methods to allow handling a larger number of blocks. This thesis could go in multiple directions, including the following (just to give you an idea):

  • scaling our previous methods to incorporate mobile manipulators or the Kobo bi-manual manipulation platform. The increased workspace of both would allow for handling a wider range of objects
  • [2] has shown more powerful, yet, it includes running a MILP for every desired structure. Thus another idea could be to investigate approaches aiming to approximate this solution
  • adapting the methods to handle more irregular-shaped objects / investigate curriculum learning

Highly motivated students can apply by sending an e-mail expressing your interest to niklas.funk@tu-darmstadt.de, attaching your letter of motivation and possibly your CV.

Requirements:

  • Good knowledge of Python
  • Experience with deep learning libraries (in particular Pytorch) is a plus
  • Experience with reinforcement learning / having taken Robot Learning is also a plus

References:
[1] Learn2Assemble with Structured Representations and Search for Robotic Architectural Construction; Niklas Funk et al.
[2] Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery; Niklas Funk et al.
[3] Structured agents for physical construction; Victor Bapst et al.

Long-Horizon Manipulation Tasks from Visual Imitation Learning (LHMT-VIL): Algorithm

Scope: Master thesis
Advisor: Suman Pal, Ravi Prakash, Vignesh Prasad, Aiswarya Menon
Added: 2022-06-16
Start: Immediately
Topic: The objective of this thesis is to create a method to solve long-horizon robot manipulation tasks from Visual Imitation Learning (VIL). Thus, given a video demonstration of a human performing a long horizon manipulation task such as an assembly sequence, a robot should imitate the identical task by analyzing the video.

The proposed architecture can be broken down into the following sub-tasks:
1. Multi-object 6D pose estimation from video: Identify the object 6D poses in each video frame to generate the object trajectories
2. Action segmentation from video: Classify the action being performed in each video frame
3. High-level task representation learning: Learn the sequence of robotic movement primitives with the associated object poses such that the robot completes the demonstrated task
4. Low-level movement primitives: Create a database of low-level robotic movement primitives which can be sequenced to solve the long-horizon task

Desired Qualification:
1. Strong Python programming skills
2. Prior experience in Computer Vision and/or Robotics is preferred

Long-Horizon Manipulation Tasks from Visual Imitation Learning (LHMT-VIL): Dataset

Scope: Master thesis
Advisor: Suman Pal, Ravi Prakash, Vignesh Prasad, Aiswarya Menon
Added: 2022-06-16
Start: Immediately
Topic: The objective of this thesis is to create a large-scale dataset to solve long-horizon robot manipulation tasks from Visual Imitation Learning (VIL). Thus, given a video demonstration of a human performing a long horizon manipulation task such as an assembly sequence, a robot should imitate the identical task by analyzing the video.

During the project, we will create a large-scale dataset of videos of humans demonstrating industrial assembly sequences. The dataset will contain information of the 6D poses of the objects, the hand and body poses of the human, the action sequences among numerous other features. The dataset will be open-sourced to encourage further research on VIL.

Desired Qualification:
1. Strong Python programming skills
2. Prior experience in Computer Vision and/or Robotics is preferred

[1] F. Sener, et al. "Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities". CVPR 2022.
[2] P. Sharma, et al. "Multiple Interactions Made Easy (MIME) : Large Scale Demonstrations Data for Imitation." CoRL, 2018.

Adaptive Human-Robot Interactions with Human Trust Maximization

Scope: Master thesis
Advisor: Kay Hansel, Georgia Chalvatzaki
Added: 2022-03-18
Start: April
Topic: Building trust between humans and robots is a major goal of Human-Robot Interaction (HRI). Usually, trust in HRI has been associated with risk aversion: a robot is trustworthy when its actions do not put the human at risk. However, we believe that trust is a bilateral concept that governs the behavior and participation in the collaborative tasks of both interacting parties. On the one hand, the human has to trust the robot about its actions, e.g., delivering the requested object, acting safely, and interacting in a reasonable time horizon. On the other hand, the robot should trust the human regarding their actions, e.g., have a reliable belief about the human's next action that would not lead to task failure; a certainty in the requested task. However, providing a computational model of trust is extremely challenging.
Therefore, this thesis explores trust maximization as a partially observable problem, where trust is considered as a latent variable that needs to be inferred. This consideration results in a dual optimization problem for two reasons: (i) the robot behavior must be optimized to maximize the human's latent trust distribution; (ii) an optimization of the human's prediction model must be performed to maximize the robot's trust. To address this challenging optimization problem, we will rely on variational inference and metrics like Mutual Information for optimization.
Highly motivated students can apply by sending an e-mail expressing your interest to kay.hansel@tu-darmstadt.de, attaching your letter of motivation and possibly your CV.

Requirements:

  • Good knowledge of Python and/or C++;
  • Good knowledge in Robotics and Machine Learning;
  • Good knowledge of Deep Learning frameworks, e.g, PyTorch;

References:
[1] Xu, Anqi, and Gregory Dudek. "Optimo: Online probabilistic trust inference model for asymmetric human-robot collaborations." ACM/IEEE HRI, IEEE, 2015;
[2] Kwon, Minae, et al. "When humans aren’t optimal: Robots that collaborate with risk-aware humans." ACM/IEEE HRI, IEEE, 2020;
[3] Chen, Min, et al. "Planning with trust for human-robot collaboration." ACM/IEEE HRI, IEEE, 2018;
[4] Poole, Ben et al. “On variational bounds of mutual information”. ICML, PMLR, 2019.

Causal inference of human behavior dynamics for physical Human-Robot Interactions

Scope: Master's thesis
Advisor:Georgia Chalvatzaki, Kay Hansel
Added: 2021-10-16
Start: ASAP
Topic: In this thesis, we will study and develop ways of approximating an efficient behavior model of a human in close interaction with a robot. We will research the extension of our prior work on the graph-based representation of the human into a method that leverages multiple attention mechanisms to encode relative dynamics in the human body. Inspired by methods in causal discovery, we will treat the motion prediction problem as such. In essence, the need for a differentiable and accurate human motion model is essential for efficient tracking and optimization of HRI dynamics. You will test your method in the context of motion prediction, especially for HRI tasks like human-robot handovers, and you could demonstrate your results in a real world experiment.

Highly motivated students can apply by sending an e-mail expressing your interest to georgia.chalvatzaki@tu-darmstadt.de, attaching your a letter of motivation and possibly your CV.

Minimum knowledge

  • Good knowledge of Python and/or C++;
  • Good knowledge of Robotics;
  • Good knowledge of Deep Learning frameworks, e.g, PyTorch

References

  1. Li, Q., Chalvatzaki, G., Peters, J., Wang, Y., Directed Acyclic Graph Neural Network for Human Motion Prediction, 2021 IEEE International Conference on Robotics and Automation (ICRA).
  2. Löwe, S., Madras, D., Zemel, R. and Welling, M., 2020. Amortized causal discovery: Learning to infer causal graphs from time-series data. arXiv preprint arXiv:2006.10833.
  3. Yang, W., Paxton, C., Mousavian, A., Chao, Y.W., Cakmak, M. and Fox, D., 2020. Reactive human-to-robot handovers of arbitrary objects. arXiv preprint arXiv:2011.08961.

Incorporating First and Second Order Mental Models for Human-Robot Cooperative Manipulation Under Partial Observability

Scope: Master Thesis
Advisor: Dorothea Koert, Joni Pajarinen
Added: 2021-06-08
Start: ASAP

The ability to model the beliefs and goals of a partner is an essential part of cooperative tasks. While humans develop theory of mind models for this aim already at a very early age [1] it is still an open question how to implement and make use of such models for cooperative robots [2,3,4]. In particular, in shared workspaces human robot collaboration could potentially profit from the use of such models e.g. if the robot can detect and react to planned human goals or a human's false beliefs during task execution. To make such robots a reality, the goal of this thesis is to investigate the use of first and second order mental models in a cooperative manipulation task under partial observability. Partially observable Markov decision processes (POMDPs) and interactive POMDPs (I-POMDPs) [5] define an optimal solution to the mental modeling task and may provide a solid theoretical basis for modelling. The thesis may also compare related approaches from the literature and setup an experimental design for evaluation with the bi-manual robot platform Kobo.

Highly motivated students can apply by sending an e-mail expressing your interest to dorothea.koert@tu-darmstadt.de attaching your CV and transcripts.

References:

  1. Wimmer, H., & Perner, J. Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception (1983)
  2. Sandra Devin and Rachid Alami. An implemented theory of mind to improve human-robot shared plans execution (2016)
  3. Neil Rabinowitz, Frank Perbet, Francis Song, Chiyuan Zhang, SM Ali Eslami,and Matthew Botvinick. Machine theory of mind (2018)
  4. Connor Brooks and Daniel Szafir. Building second-order mental models for human-robot interaction. (2019)
  5. Prashant Doshi, Xia Qu, Adam Goodie, and Diana Young. Modeling recursive reasoning by humans using empirically informed interactive pomdps. (2010)