Imitation Learning

We investigate methods that enable non-expert users to program robots by means of demonstrations. Such imitation learning methods enable the robot to solve a given task in similar style compared to a human, which makes the robot's behavior more predictable and, thus, safer for human-robot interactions. In addition to imitation learning methods, which directly aim to infer the policy from human demonstrations, we also develop methods for inverse reinforcement learning, which aim to infer a reward function as a concise representation of the demonstrated task. Such reward function can enable the robot to better react to changes in the environment, or to better predict the human behavior.

To get a quick introduction to imitation learning in robotics, do look at our survey An Algorithmic Perspective on Imitation Learning in which we analyze the methods of imitation learning, namely behavior cloning and inverse reinforcement learning. In addition to method analysis, we discuss the design decisions a practitioner must make when selecting an imitation learning approach. Moreover, application examples—such as robots that play table tennis and programs that play the game of Go — illustrate the properties and motivations behind different forms of imitation learning. We conclude by presenting a set of open questions and point towards possible future research directions.

  •     Bib
    Osa, T.; Pajarinen, J.; Neumann, G.; Bagnell, J.A.; Abbeel, P.; Peters, J. (2018). An Algorithmic Perspective on Imitation Learning, Foundations and Trends in Robotics.

For a more up-to-date overview of deep generative models for robotics, please consult the survey Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations. In this work, we explore recent works and trends in imitation Learning, behavioral cloning, or inverse reinforcement learning in the context of multimodal and high-dimensional demonstrations.

  •     Bib
    Urain, J.; Mandlekar, A.; Du, Y.; Shafiullah, M.; Xu, D.; Fragkiadaki, K.; Chalvatzaki, G.; Peters, J. (submitted). Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations, IEEE Transactions on Robotics (T-RO).

Behavioral Cloning & Deep Generative Models

SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. Multi-objective optimization problems are ubiquitous in robotics, e.g., the optimization of a robot manipulation task requires a joint consideration of grasp pose configurations, collisions and joint limits. While some demands can be easily hand-designed, e.g., the smoothness of a trajectory, several task-specific objectives need to be learned from data. This work introduces a method for learning data-driven SE(3) cost functions as diffusion models. Diffusion models can represent highly-expressive multimodal distributions and exhibit proper gradients over the entire space due to their score-matching training objective. Learning costs as diffusion models allows their seamless integration with other costs into a single differentiable objective function, enabling joint gradient-based motion optimization. In this work, we focus on learning SE(3) diffusion models for 6DoF grasping, giving rise to a novel framework for joint grasp and motion optimization without needing to decouple grasp selection from trajectory generation. We evaluate the representation power of our SE(3) diffusion models w.r.t. classical generative models, and we showcase the superior performance of our proposed optimization framework in a series of simulated and real-world robotic manipulation tasks against representative baselines.

  •       Bib
    Urain, J.; Funk, N.; Peters, J.; Chalvatzaki G (2023). SE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion, International Conference on Robotics and Automation (ICRA).

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models. Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.

  •       Bib
    Carvalho, J.; Le, A. T.; Baierl, M.; Koert, D.; Peters, J. (2023). Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
  •     Bib
    Carvalho, J.; Baierl, M; Urain, J; Peters, J. (2022). Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation, NeurIPS 2022 Workshop on Score-Based Methods.

Inverse Reinforcement Learning

Least-Squares Inverse Q-Learning. Recent advances in imitation learning have explored learning a Q-function through implicit reward formulation, eschewing explicit reward functions. However, these methods often necessitate implicit reward regularization for stability and struggle with absorbing states. While prior work has suggested the efficacy of squared norm regularization on the implicit reward function, lacking theoretical analysis, our research introduces a novel perspective. By incorporating this regularization within a mixture distribution of the policy and the expert, we elucidate its role in minimizing squared Bellman error and optimizing a bounded χ2-Divergence between the expert and the mixture distribution. This framework addresses instabilities and effectively handles absorbing states. Our proposed approach, Least Squares Inverse Q-Learning (LS-IQ), outperforms existing algorithms, particularly in environments with absorbing states. Additionally, we propose leveraging an inverse dynamics model for learning solely from observations, maintaining performance in scenarios lacking expert actions.

  •       Bib
    Al-Hafez, F.; Tateo, D.; Arenz, O.; Zhao, G.; Peters, J. (2023). LS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning, International Conference on Learning Representations (ICLR).

Robot Task Planning

Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning. Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans, thereby learning to reason over long-horizon tasks, as encountered in the ALFRED benchmark. We compare our approach with classical planning and baseline methods to examine the applicability and generalizability of LLM-based planners. Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.

  •     Bib
    Chalvatzaki, G.; Younes, A.; Nandha, D.; Le, A. T.; Ribeiro, L.F.R.; Gurevych, I. (2023). Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning, in: Dimitrios Kanoulas (eds.), Frontiers in Robotics and AI.

Language-Conditioned Manipulation

A Unifying Perspective on Language-Based Task Representations for Robot Control. Natural language is becoming increasingly important in robot control for both high-level planning and goal-directed conditioning of motor skills. While a number of solutions have been proposed already, it is yet to be seen what architecture will succeed in seamlessly integrating language, vision, and action. To better understand the landscape of existing methods, we propose to view the algorithms from the perspective of “Language-Based Task Representations”, i.e., categorizing the methods that condition robot action generation on natural language commands according to their task representation and embedding architecture. Our proposed taxonomy intuitively groups existing algorithms, highlights their commonalities and distinctions, and suggests directions for further investigation.

  •     Bib
    Toelle, M.; Belousov, B.; Peters, J. (2023). A Unifying Perspective on Language-Based Task Representations for Robot Control, CoRL Workshop on Language and Robot Learning: Language as Grounding.

Forceful Imitation Learning

Learning forceful manipulation skills from multi-modal human demonstrations. Learning from Demonstration (LfD) provides an intuitive and fast approach to program robotic manipulators. Task-parameterized representations allow easy adaptation to new scenes and online observations. However, this approach has been limited to pose-only demonstrations and thus only skills with spatial and temporal features. We extend the LfD framework to address forceful manipulation skills, which are important for industrial processes such as assembly. For such skills, multi-modal demonstrations, including robot end-effector poses, force and torque readings, and operation scenes, are essential. We aim to reproduce such skills reliably according to the demonstrated pose and force profiles within different scenes.

  •       Bib
    Le, A. T.; Guo M.; Duijkeren, N.; Rozo, L.; Krug, R.; Kupcsik, A.G.; Buerger, M. (2021). Learning forceful manipulation skills from multi-modal human demonstrations, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

Probabilistic Movement Primitives and Their Application

Probabilistic Movement Primitives. Movement Primitives (MP) are a well-established approach for representing modular and re-usable robot movement generators. Many state-of-the-art robot learning successes are based MPs, due to their compact representation of the inherently continuous and high dimensional robot movements. A major goal in robot learning is to combine multiple MPs as building blocks in a modular control architecture to solve complex tasks. To this effect, a MP representation has to allow for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We present a probabilistic formulation of the MP concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We evaluate and compare our approach to existing methods on several simulated as well as real robot scenarios.

  •     Bib
    Paraschos, A.; Daniel, C.; Peters, J.; Neumann, G (2013). Probabilistic Movement Primitives, Advances in Neural Information Processing Systems (NIPS / NeurIPS), MIT Press.

Learning Movement Primitive Libraries through Probabilistic Segmentation. Movement primitives are a well established approach for encoding and executing movements. While the primitives themselves have been extensively researched, the concept of movement primitive libraries has not received similar attention. Libraries of movement primitives represent the skill set of an agent. Primitives can be queried and sequenced in order to solve specific tasks. The goal of this work is to segment unlabeled demonstrations into a representative set of primitives. Our proposed method differs from current approaches by taking advantage of the often neglected, mutual dependencies between the segments contained in the demonstrations and the primitives to be encoded. By exploiting this mutual dependency, we show that we can improve both the segmentation and the movement primitive library. Based on probabilistic inference our novel approach segments the demonstrations while learning a probabilistic representation of movement primitives. We demonstrate our method on two real robot applications. First, the robot segments sequences of different letters into a library, explaining the observed trajectories. Second, the robot segments demonstrations of a chair assembly task into a movement primitive library. The library is subsequently used to assemble the chair in an order not present in the demonstrations.

  •     Bib
    Lioutikov, R.; Neumann, G.; Maeda, G.; Peters, J. (2017). Learning Movement Primitive Libraries through Probabilistic Segmentation, International Journal of Robotics Research (IJRR), 36, 8, pp.879-894.

Demonstration Based Trajectory Optimization for Generalizable Robot Motions. Learning motions from human demonstrations provides an intuitive way for non-expert users to teach tasks to robots. In particular, intelligent robotic co-workers should not only mimic human demonstrations but should also be able to adapt them to varying application scenarios. As such, robots must have the ability to generalize motions to different workspaces, e.g. to avoid obstacles not present during original demonstrations. Towards this goal our work proposes a unified method to (1) generalize robot motions to different workspaces, using a novel formulation of trajectory optimization that explicitly incorporates human demonstrations, and (2) to locally adapt and reuse the optimized solution in the form of a distribution of trajectories. This optimized distribution can be used, online, to quickly satisfy via-points and goals of a specific task. We validate the method using a 7 degrees of freedom (DoF) lightweight arm that grasps and places a ball into different boxes while avoiding obstacles that were not present during the original human demonstrations.

  •     Bib
    Koert, D.; Maeda, G.J.; Lioutikov, R.; Neumann, G.; Peters, J. (2016). Demonstration Based Trajectory Optimization for Generalizable Robot Motions, Proceedings of the International Conference on Humanoid Robots (HUMANOIDS).