OLD/DEPRICATED Offered Topics / Angebotene Themen fuer Abschlussarbeiten

We are always searching for highly motivated students to do their Bachelor's and Master's theses at IAS. Here is a sample of current topics (for past and ongoing theses look here).

Active Learning for In-Hand Object Pose Estimation

Scope: Master's thesis, Bachelor thesis, Projects Start: ASAP


Understanding the correct pose of an object with respect to the hand of the robot is essential for sucessful manipulation. It is probably one of the most basic requirements to, for example, insert a pin in a hole and yet, extremely challenging. In industry, this is usually achieved by structuring the environment and designing end-effector and holders such that the robot always picks a part in the same manner. Outside the factories, robots must be able to estimate the pose of the picked object. This can be achieved by registering many possible images of the object in hand, at different poses, and therefore increasing the certainty of the kinematic configuration. Once this certainty is low enough we can allow the robot proceed with the task.

Certainly, we want the robot to register such poses in the most possible efficient way. This leads to the investigation of active learning methods, under the constraints of the kinematics of the robot arm. This project will potentially involve collaboration and co-advising with members of the group Intelligent and Interactive Systems .

Are you interested? Then please contact Guilherme Maeda <maeda@ias.tu-darmstadt.de> or Rudolf Lioutikov <lioutikov@ias.tu-darmstadt.de>

Learning in-hand manipulation skills through kinesthetic teaching with sensor gloves

Scope: Master's thesis, Bachelor thesis, Projects Start: Okt. 2015

In-hand manipulation skills refer to moving and positioning objects within one hand. Humans have incredible in-hand manipulation skills, e.g., we can flip pencils or tools with ease (even in the dark without visual feedback). In the european project TACMAN, we aim at endowing robots with similar capabilities. The vision is that robots with dexterous in-hand manipulation skills will ring in the next generation of smart human co-workers, manifesting the leading role of Europe in logistics and automation.

In this project, we investigate robot manipulation skill learning through human demonstrations. The key concept is that wearable sensor gloves route force feedback to the demonstrator, which reflects measured contact forces in tactile senors while manipulating objects in a robot hand. Exploiting this feedback allows for rapid training of complex robot skills. The goal of this thesis is to use and extend probabilistic latent variable models for learning a mapping from temporal sequences of tactile sensor readings to context dependent motor plans. The success of the approach is evaluated in terms of generalizing the demonstrated skills to novel environments and objects. At international TACMAN meetings, the student will be able to discuss the approach with leading researchers in tactile manipulation and robotics research.

References:

  1. Rueckert, E. and Lioutikov, R. and Calandra, R. and Schmidt, M. and Beckerle, P. and Peters, J. (2015). Low-cost Sensor Glove with Force Feedback for Learning from Demonstrations using Probabilistic Trajectory Representations (ICRA).
  2. Rueckert, E.; Mundo, J.; Paraschos, A.; Peters, J.; Neumann, G. (2015). Extracting Low-Dimensional Control Variables for Movement Primitives, Proceedings of the International Conference on Robotics and Automation (ICRA).

Contact: Dr. Elmar Rueckert - rueckert@ias.tu-darmstadt.de, Prof. Jan Peters - mail@jan-peters.net, Intelligent Autonomous Systems

Stochastic Optimal Control of Humanoid Robots in multi-contact environments

Scope: Master's thesis, Bachelor thesis, Projects Start: Okt. 2015

Controlling high-dimensional humanoid robots is challenging. Both, noisy sensory streams have to be processed and high-dimensional control signals need to be generated in real-time. Stochastic optimal control methods generate optimal solutions to such motor tasks by explicitly modeling uncertainties about the environment and by exploiting the dynamics of the system [1-2]. Complex task constraints like maintaining balance through supporting contacts, adapting the movement speed and avoiding potential obstacles can be efficiently encoded and combined [3]. However, stochastic optimal control method applications in real humanoid robots are rare due to the real-time processing requirement and challenging model learning problems.

The goal of this thesis is to implement powerful probabilistic inference methods [3] in real humanoid robots (e.g., in our Nao or in our iCub) and to solve challenging motor tasks. The student will implement and compare real-time control strategies (using Matlab and C++), develop control interfaces to a real robot and will study reaching and manipulation tasks in multi-contact environments. This thesis is supported by the EU-project CoDyCo, which finances student exchange programs.

References:

  1. Bertsekas, Dimitri P. (1995). Dynamic programming and optimal control. Vol. 1. No. 2. Belmont, MA: Athena Scientific.
  2. Toussaint, Marc (2010). A Bayesian view on motor control and planning. In From motor to interaction learning in robots, Springer, 2010.
  3. Rueckert, E.; Mindt, M.; Peters, J.; Neumann, G. (2014). Robust Policy Updates for Stochastic Optimal Control, Proceedings of the International Conference on Humanoid Robots (HUMANOIDS).

Contact: Dr. Elmar Rueckert - rueckert@ias.tu-darmstadt.de, Prof. Jan Peters - mail@jan-peters.net, Intelligent Autonomous Systems

Wachsende Neuronale Netz zur Bewegungskoordination

Scope: Master's thesis, Bachelor thesis, Projects Start: Okt. 2015

Movement primitives sind kompakte Repraesentationen komplexer Bewegungsabläufe in der Robotik oder in der Neurowissenschaft. Durch variieren der Parameter dieser Modelle koennen neue Bewegungen generiert werden. Wie bei allen Modellen die Daten erklären muss eine Balance zwischen der Modelkomplexität und der Ausdrucksfähigkeit gefunden werden. D.h. die Anzahl der veränderbaren Parameter wird fixiert. Diese Entscheidung wird aktuell noch durch Experten durchgeführt.

In diesem Projekt wollen wir die Modellkomplexität in Abhänigkeit des Schwierigkeitsgrades der Aufgabe automatisch bestimmen. Diese adaptive Representation könnte für eine Vielzahl divergenter Aufgaben herangezogen werden. Auf dem Darias Roboter könnten, angefangen von einfachen Berührungsoperationen bis hin zu komplexen Kombination von beidhändiger Aktivitäten, eine Fülle von Skills gelernt werden. Diese Arbeit ist Teil des Spiking Neural Networks for Motor Control, Planning and Learning Projektes.

Contact: Dr. Elmar Rueckert - rueckert@ias.tu-darmstadt.de, Prof. Jan Peters - mail@jan-peters.net, Intelligent Autonomous Systems

Generalizing local feedforward control with machine learning methods

Optimal control and learning methods for control can be very effective in generating and improving control policies locally, at the vicinity of a given reference trajectory or a demonstrated movement. Local methods, however, lack the ability to generalize actions to different regions of the state-space. This project aims at exploring the potential benefits of machine learning methods to (a) improve the learning performance of local controllers and (b) to generalize local controllers by interpolating (and extrapolating) a collection of individual control actions. In this interdisciplinary project the student will work with iterative learning control as a local feedforward controller and Gaussian Process as a machine learning method. Experiments will potentially involve the use of the BioRob platform to validate the proposed algorithm in a mechanical system with challenging and uncertain dynamics.

Are you interested? Then please contact Guilherme Maeda <maeda@ias.tu-darmstadt.de>.

Learning Robot Tactile Sensing for Object Manipulation

Tactile sensing is a vital skill for robots performing manipulation tasks. The sense of touch provides both detailed information about held objects as well as the manipulation actions performed with these objects. For example, humans use the sense of touch to recognize the shape and material properties of held objects, as well as localize the object in their hand. This information can then be used to infer the location of manipulation-relevant features (e.g. the head of a hammer) relative to the hand.

Despite the important role of tactile sensing in human manipulation tasks, the use of tactile sensing in robot applications has not been thoroughly explored. Most of the previous work has only focused on developing the tactile sensor hardware. Learning to efficiently use the rich information provided by these tactile sensors remains an open problem. Machine learning approaches are particularly promising for robots working in unstructured environments, e.g., at home, as such robot will need to adapt to a wide range of objects and manipulation tasks.

The goal of this thesis is therefore to explore machine learning methods for robot tactile sensing. The project involves first creating a simple testbed for tactile sensing experiments, using low-cost off-the-shelf tactile sensors. This testbed will then be used to evaluate and develop machine learning methods for extracting task-relevant information from the sensors.

Contact: Oliver Kroemer, oli@robot-learning.de

Two Player Table Tennis

Learning to play Table Tennis is a very challenging task as we need to be able to react very quickly to the strokes of the opponent. One key ability that is used from humans is to predict where the ball the opponent will serve the ball (including speed, spin, etc) just from he movement of the opponent *before* hitting the ball. In this project we want to use this insight for learning a simulated robotic table tennis player. The task is two learn how to play a 2 player table tennis game. The agents should not just learn how to return a table tennis ball very accurately, but also learn to predict the impact position of the ball, given the movement of the other robot, such that the movement can already be prepared earlier. Both agents should maintain a model of the opponent that tells them where the opponent will shoot, but also, what the opponent will predict given the movement of the player itself. This ability can ideally be used to feign different strike directions or strike strengths. The goal of this project is to build a basic reinforcement learning algorithm which can use the knowledge of the opponent as much as possible.

Scope:

Master Thesis, Bachelor Thesis (with simplified requirements)

Research Program for the Thesis:

  • Implement a second player in our table tennis simulation
  • Implement different strike types with the analytical model
  • Implement intention prediction algorithms to predict the impact point of the ball before the ball is hit by the opponent from its movement
  • Include the predicted goal in the state space of a reinforcement learning agent to chose the strike type and the strike direction.

References:

  1. Muelling, K.; Kober, J.; Peters, J. (2011). A Biomimetic Approach to Robot Table Tennis, Adaptive Behavior Journal, 19, 5.
  2. Wang, Z.; Muelling, K.; Deisenroth, M. P.; Ben Amor, H.; Vogt, D.; Schoelkopf, B.; Peters, J. (2013). Probabilistic Movement Modeling for Intention Inference in Human-Robot Interaction, International Journal of Robotics Research
  3. Daniel, C.; Neumann, G.; Peters, J. (2012). Hierarchical Relative Entropy Policy Search, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2012).

Contact: Dr. Gerhard Neumann - neumann@ias.tu-darmstadt.de, Alexandros Paraschos - paraschos@ias.tu-darmstadt.de, Intelligent Autonomous Systems

Composition of Movement Primitives (already taken)

Movement primitives (MPs) are a popular tool in robot learning for representing elemental movements, for example, a stroke movement in a tennis game or moving the legs to a desired position during walking. Movement primitives exhibit many beneficial properties, they represent a movement compactly, can be easily obtained from demonstrations and improved with reinforcement learning, they can be used to generalize the movement to new situations such as a new desired position or we can even temporally scale a movement. A promising idea to use movement primitives is to combine several primitives in a modular control architecture. To do so, the primitives have to support simultaneous activation and continuous blending between the primitives. However, such functionality has not been supported until recently by most movement primitive representations. We proposed to use probabilistic representations for MPs that allow new operations on movement representations, such as combination by multiplying distributions and conditioning for modulating the movement. While the probabilistic MP representation can be used to combine MPs, such ability has not been used in practice for a challenging application. In this thesis we want to use the probabilistic MP representation in an hierarchical Policy that learns to combine several MPs to solve an overall task. As task, we will either choose a table tennis simulation or a simulation of a planar walking robot.

Scope:

Master Thesis

Research Program for the Thesis:

  • Implement the probabilistic MP framework for the corresponding simulation
  • Learn a single MP with reinforcement learning such that it optimizes a given cost-function
  • Combine the MPs by combining the different cost functions
  • Learn the MPs simultaneously where you assume that the decomposition of the cost function in

single terms for each MP is known

  • (extra) Try also to infer a good structure of the decomposition of the costs by using latent variable estimation methods

References:

  1. Paraschos, A.; Neumann, G; Daniel, C.; Peters, J. (2013). Probabilistic Movement Primitives, Advances in Neural Information Processing Systems (NIPS), Cambridge, MA: MIT Press..
  2. Schaal, S.;Peters, J.;Nakanishi, J.;Ijspeert, A. (2004). Learning Movement Primitives, International Symposium on Robotics Research (ISRR2003),

Contact: Dr. Gerhard Neumann - neumann@ias.tu-darmstadt.de, Alexandros Paraschos - paraschos@ias.tu-darmstadt.de, Intelligent Autonomous Systems

Information-Theoretic Dynamic Programming (already taken)

Dynamic Programming (DP) is a common way to obtain optimal value functions and, hence, optimal policies in the context of reinforcement learning. One particular application of DP is policy iteration which consists of a policy evaluation step to obtain the value function of the current policy, and, subsequently obtain the greedy policy with respect to this value function. Such an approach works well in practice if we assume that the system dynamics are known. However, typically this is not the case and we also need to learn the system dynamics. Hence, we have to consider that our learned system dynamics are not completely reliable, and, hence, a greedy update of the policy might actually 'damage' the policy such that it leads the agent to unexplored regions of the state space where the learned model has a low quality. Hence, a promising new approach is to formulate the policy update as constraint optimization problem where we add an information-theoretic constraint that the state-action distribution of the new and the old policy is bounded. This constraint optimization problem has several interesting properties. We do not need to impose the existence of a value function, but the value function emerges automatically out of the optimization problem. It also offers a new way for approximating this function by matching feature averages. This task of this thesis is to evaluate the information theoretic policy update and to compare to existing approaches such as LSTD.

Scope:

Master Thesis

Research Program for the Thesis:

  • Implement existing dynamic programming methods in continuous environments such as LSTD
  • Extend the existing implementation of the information theoretic policy update to the discounted reward case
  • Learn the expectation operator involved in the optimization problem
  • Compare the approaches on different benchmark tasks with different feature representation

References:

  1. Michail G. Lagoudakis , Ronald Parr , L. Bartlett, Least-squares policy iteration, Journal of Machine Learning Research, 2003
  2. Peters, J.; Muelling, K.; Altun, Y. (2010). Relative Entropy Policy Search, Proceedings of the Twenty-Fourth National Conference on Artificial Intelligence (AAAI), Physically Grounded AI Track.
  3. Daniel, C.; Neumann, G.; Peters, J. (2012). Hierarchical Relative Entropy Policy Search, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2012).

Contact: Dr. Gerhard Neumann, Intelligent Autonomous Systems, IAS, neumann@ias.tu-darmstadt.de

Wie fangen Menschen Baelle? (already taken)

Der menschliche Koerper ist imperfekt, seine Informationsverarbeitung langsam und ungenau! Dennoch ist der Mensch den meisten Robotern in der Agilitaet nahezu aller Motorfaehigkeiten weit ueberlegen. Diese Frage fasziniert Biologen, Robotiker und Neuroinformatiker seit langer Zeit. In

dieser Bachelorarbeit wollen wir Ihr ein Wenig naeherkommen indem wir ein Paradox studieren: Wie fangen Menschen Baelle? Wenn ein Baseballspieler einen Ball fangen soll, dann waere die optimale Strategie: Sage exakt vorher wo der Ball hingeht, renne vorwaerts dort hin so schnell es geht, dreh Dich um und warte auf den Ball. Macht dies irgendein Baseballspieler? NEIN! Sie schauen auf den Ball, laufen rueckwaerts gemuetlich in die Richtung der Ballbewegung und treffen fast gleichzeitig wie der Ball am Fang-Ort ein. Warum? Diese Strategie ist viel robuster, besonders da der Ball ist nur wenig schneller als der Mensch ist und der Mensch auf verbesserte Ballschaetzung reagieren kann [1]. Man nennt Motorstrategie reaktiv. Aber, es gibt Faelle wo diese Strategie nicht funktioniert. Kommt z.B. der Ball aus naechster Naehe (wie beim Tischtennis) und/oder mit hoher Geschwindigkeit (wie beim Tennis), so kann der Mensch den Ball immer noch fangen! Das ist wahrlich erstaunlich: die Verarbeitung der visuellen Information braucht etwa 100-150ms und das Signal benoetigt mindestens 80ms vom Gehirn zur Hand. Um einen Tischtennisball mit 30m/s zu fangen, muss der Mensch wissen wo er hingeht wenn er noch ueber 5m entfernt ist - ansonsten kommt das Signal niemals rechtzeitig an. Hier wird eine vorausschauende und geplante Motorstrategie benoetigt. In dieser Bachelor oder Masterarbeit soll dieses Paradox betrachtet werden. Kann es sein, dass beide Szenarien durch ein Prinzip erklaert werden koennen? Dies soll in dieser Arbeit anhand von einem vereinfachten Beispiel mit der Algorithmenentwurfsmethode Dynamic Programming untersucht werden!

Vorgehen:

  • Literaturvorstudie und Modellierung in einem vereinfachten Szenario.
  • Entwicklung des Dynamic Programming-basierten Algorithmus.
  • Anwendung des Algorithmus as B im Szenario aus A.

Literatur:

  1. J.N.Marewski, W.Gaissmaier, G.Gigerenzer (1998). Good judgments do not require complex cognition, Cogn Process
  2. R. Sutton, A. Barto (1998). Reinforcement Learning. MIT Press.

Contact: Prof. Jan Peters, Intelligent Autonomous Systems, IAS, mail@jan-peters.net

Learning a Friction Hystersis with MOSAIC

Inspired by results in neuroscience, especially in the Cerebellum, Kawato & Wolpert [1] introduced the idea of the MOSAIC (modular selection and identification for control) learning architecture. In this architecture, local forward models, i.e., models that predict future states and events, are learned directly from observations. Based on the prediction accuracy of these models, corresponding inverse models can be learned. Despite initial success, this architecture has only been rarely used as it is often easier to directly learn inverse models than a complete MOSAIC. However, this is ONLY the case as scientists rarely focused on learning solutions to problems where inverse models exist in a closed form. As there is a wide range of problems where that is the case (ranging from end-effector control to locomotion), fantastic chances have been missed. Here, we want to focus on a the problem of controlling a robot system with a hystersis in its friction. This problem is impossible to solve with the classical approaches and can only be addressed with a MOSAIC-like approach.

Research Program for the Thesis:

  • Implement MOSAIC on a toy example and learn its basic properties, based on [1].
  • Study friction models based on [2], implement one with hysterses and use it to explain real physical data of a Barrett WAM robot.
  • Apply MOSAIC to Learn to Control with Hystereses.

References:

  1. D.M.Wolpert, M.Kawato (1998). Multiple paired forward and inverse models for motor control. Neural Networks.
  2. H. Olsson, K.J. Astrom, C. Canudas de Wit, M. Gofvert, P. Lischinsky (1998). Friction Models and Friction Compensation, European Journal of Control.

Contact: Prof. Jan Peters, Intelligent Autonomous Systems, IAS, mail@jan-peters.net

Koennen Lernalgorithmen interagieren wie im Gehirn? (already taken)

Der groesste Unterschied in der Informationsverarbeitung von Menschen und Maschinen liegt in der Faehigkeit des Menschen, komplett Neues zu erlernen. Bis zum heutigen Tag sind wir weder in der Lage diese beeindruckende Leistung zu verstehen noch sie in intelligenten Informationssystemen zu reproduzieren. Fruehe Versuche, das Gehirn mit neuronalen Netzen nachzubauen sind an der Groesse und Komplexitaet von Primatengehirnen gescheitert. Aber aus der Bewegung der neuronalen Netze ist das moderne statistische Lernen entstanden. Viele effiziente neue Lernalgorithmen zum ueberwachten Lernen, zum Reinforcement Learning und zum unueberwachten Lernen sind entwickelt worden und haben neue Anwendungen gefunden. Gleichzeit hat das grundlegende Verstaendnis des menschlichen Gehirnes Fortschritte gemacht und es wird vermutet, dass es im Gehirn eine Zusammenspiel von Lernarten (ueberwachtes, unueberwachtes Lernen sowie Reinforcement Learning) gibt. In der Tat ist es mittlerweile realistisch, ein solches Zusammenspiel mit statistischen Lernverfahren nachzubilden. Vielleicht kann hier eine Art kuenstliches Gehirn entstehen?

Vorgehensweise:

  • Einarbeiten ins statistische Lernen [1] und Auswahl von einzelnen Algorithmen zum eigenen Verstaendnis.
  • Grundsaetzliches Verstaendnis des Zusammenspieles von Lernalgorithmen im Gehirn entwickeln anhand von [2].
  • Erprobung von unterschiedlichen Verschaltungen von Lernalgorithmen. Hierzu werden die Algorithmen als Module zusammengesteckt und geschaut, ob etwas cooles rauskommt!

D. Anwendung in einem komplexeren System, z.B. in der Robotik oder in Spielen.

Literatur:

  1. C. Bishop (2008). Pattern Recognition and Machine Learning. Springer Verlag.
  2. K. Doya (2000). What are the Computations of the Cerebellum, the Basal Ganglia, and the Cerebral Cortex? Neural Networks

Contact: Dr. Gerhard Neumann, Intelligent Autonomous Systems, IAS, neumann@ias.tu-darmstadt.de