The Computational Learning for Autonomous Systems Group and the Intelligent Autonomous Systems Group are currently partners in more than eight EU projects (June, 2015). Some of the most exciting results developed within these projects are shown here. The videos are part of our YouTube channel.
Combining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.
Despite their abundance in robotics and nature, underactuated systems remain a challenge for control engineering. Trajectory optimization provides a generally applicable the solution, however, its efficiency strongly depends on the skill of the engineer to frame the problem in an optimizer-friendly way. This paper proposes a procedure that automates such problem reformulation for a class of tasks in which the desired trajectory is specified by a sequence of waypoints. The approach is based on introducing auxiliary optimization variables that represent waypoint activations. To validate the proposed method, a letter drawing task is set up where shapes traced by the tip of a rotary inverted pendulum are visualized using long exposure photography.
When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such policies are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. In this paper, we propose Bayesian Domain Randomization (BayRn), a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning given sparse data from the real-world target domain. BayRn uses Bayesian optimization to search the space of source domain distribution parameters such that this leads to a policy which maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach in sim-to-sim as well as in sim-to-real experiments, comparing against three baseline methods on two robotic tasks. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.
Sim-to-Real transfer of policies learned with Bayesian Domain Randomization (BayRn) on the Barrett WAM (ball-in-a-cup task) and the Quanser Qube (swing-up and balance task). When learning from simulations, the optimizer is free to exploit the simulation. Thus the resulting policies can perform very well in simulation but transfer poorly to the real-world counterpart. For example, both of the subsequent policies yield a return of 1, thus look equally good to the learner. Bayesian Domain Randomization (BayRn) uses a Gaussian process to learn how to adapt the randomized simulator solely from the observed real-world returns. BayRn is agnostic toward the policy optimization subroutine. In this work we used PPO and Power. We also evaluated BayRn on an underactuated swing-up and balance task.
Robots that can learn in the physical world will be important to enable robots escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system. Therefore, learning these tasks on the physical robot amplifies the necessity of sample efficiency and safety for robot learning algorithms, making a high-speed task an ideal benchmark to highlight robot learning systems. To achieve learning on the physical system, we propose a learning system that directly incorporates the safety and sample efficiency requirements in the design of the policy representation, initialization and optimization. This approach is in contrast to prior work which mainly focuses on the learning algorithm details, but neglect the engineering details. We demonstrate that this system enables the high-speed Barrett WAM to learn juggling of two balls from 56 minutes of experience. The robot learns to juggle consistently solely based on a binary reward signal. The optimal policy is able to juggle for up to 33 minutes or about 4500 repeated catches.
Learning of the Juggling Task
For the learning on the physical Barrett WAM 20 episodes were performed. During each episode 25 randomly sampled parameters were executed and the episodic reward evaluated. If the robot successfully juggles for 10s, the roll-out is stopped. Roll-outs that were corrupted due to obvious environment errors were repeated using the same parameters. Minor variations caused by the environment initialization were not repeated. After collecting the samples, the policy was updated using eREPS with a KL constraint of 2. The video shows **all** trials executed on the physical system to learn the optimal policy.
Highly dynamic robotic tasks require high-speed and reactive robots. These tasks are particularly challenging due to the physical constraints, the hardware limitations, and the high uncertainty of dynamics and sensor measures. To face these issues, it's crucial to design robotics agents that generate precise and fast trajectories and react immediately to environmental changes. Air hockey is an example of this kind of task. Due to the environment's characteristics, it is possible to formalize the problem and derive clean mathematical solutions. For these reasons, this environment is perfect for pushing to the limit the performance of currently available general-purpose robotic manipulators. Using two Kuka IIWA 14, we show how to design a policy for general-purpose robotic manipulators for the air hockey game. We demonstrate that a real robot arm can perform fast-hitting movements and that the two robots can play against each other on a medium-size air hockey table in simulation.
Depending on the task at hand, learning behavior via reinforcement learning can be challenging or impractical - for example, due to the unsolved problem of targeted exploration. In this work, we take a look at so-called curriculum reinforcement learning, in which the goal is to sidestep challenges of RL algorithms by training them on a sequence of tasks that guides their learning towards a target task (or a set of those). More precisely, we show that an instantiation of self-paced learning in the domain of RL
a) generates curricula that can drastically improve the learning performance of RL agents
b) can be seen as a form of tempering applied to the RL objective.
Reinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient algorithms that perform updates using on-policy samples. The price of such inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited. We address this issue by building on the general sample efficiency of off-policy algorithms. With nonparametric regression and density estimation methods we construct a nonparametric Bellman equation in a principled manner, which allows us to obtain closed-form estimates of the value function, and to analytically express the full policy gradient. We provide a theoretical analysis of our estimate to show that it is consistent under mild smoothness assumptions and empirically show that our approach has better sample efficiency than state-of-the-art policy gradient methods.
Quanser CartPole: application of a stochastic policy learned in simulation with NOPG-S using a randomly uniform sampled dataset.
Learning robot control policies from physics simulations is of great interest to the robotics community as it may render the learning process faster, cheaper, and safer by alleviating the need for expensive real-world experiments. However, the direct transfer of learned behavior from simulation to reality is a major challenge. Optimizing a policy on a slightly faulty simulator can easily lead to the maximization of the `Simulation Optimization Bias` (SOB). In this case, the optimizer exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. We tackle this challenge by applying domain randomization, i.e., randomizing the parameters of the physics simulations during learning. We propose an algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) which uses an estimator of the SOB to formulate a stopping criterion for training. The introduced estimator quantifies the over-fitting to the set of domains experienced while training. Our experimental results on two different second order nonlinear systems show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator, which can be applied directly to real systems without any additional training.
Sim-to-Real transfer of a policy learned by Simulation-based Policy Optimization with Transferability Assessment (SPOTA) on the Ball-Balancer and Cart-Pole platform from Quanser.
We apply domain randomization, i.e., randomizing the parameters of the physics simulations during learning. We propose an algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) which uses an estimator of the SOB to formulate a stopping criterion for training. The introduced estimator quantifies the over-fitting to the set of domains experienced while training. Supplementary Video to "Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment" (CoRL 2018) comparing against LQR, (vanilla) TRPO, and EPOpt synchronized random seeds 4 different initial positions. In this setup, we both train and test in vertex.
We apply domain randomization, i.e., randomizing the parameters of the physics simulations during learning. We propose an algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) which uses an estimator of the SOB to formulate a stopping criterion for training. The introduced estimator quantifies the over-fitting to the set of domains experienced while training. Supplementary Video to "Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment" (CoRL 2018) comparing against LQR, (vanilla) TRPO, and EPOpt synchronized random seeds 4 different initial positions. In this setup, we run in environment with nominal parameters.
Teaching motor skills to robots through human demonstrations, an approach called “imitation learning”, is an alternative to hand-coding each new robot behavior. Imitation learning is relatively cheap in terms of time and labor and is a promising route to give robots the necessary functionalities for widespread use in households, stores, hospitals, etc. However, current imitation learning techniques struggle with a number of challenges that prevent their wide usability. For instance, robots might not be able to accurately reproduce every human demonstration and it is not always clear how robots should generalize a movement to new contexts. This paper addresses those challenges by presenting a method to incrementally teach context-dependent motor skills to robots. The human demonstrates trajectories for different contexts by moving the links of the robot and partially or fully refines those trajectories by disturbing the movements of the robot while it executes the behavior it has learned so far. A joint probability distribution over trajectories and contexts can then be built based on those demonstrations and refinements. Given a new context, the robot computes the most probable trajectory, which can also be refined by the human. The joint probability distribution is incrementally updated with the refined trajectories. We have evaluated our method with experiments in which an elastically actuated robot arm with four degrees of freedom learns how to reach a ball at different positions
Robots that can learn over time by interacting with non-technical users must be capable of acquiring new motor skills, incrementally. The problem then is deciding when to teach the robot a new skill or when to rely on the robot generalizing its actions. This decision can be made by the robot if it is provided with means to quantify the suitability of its own skill given an unseen task. To this end, we present an algorithm that allows a robot to make active requests to incrementally learn movement primitives. A movement primitive is learned on a trajectory output by a Gaussian Process. The latter is used as a library of demonstrations that can be extrapolated with confidence margins. This combination not only allows the robot to generalize using as few as a single demonstration but more importantly, to indicate when such generalization can be executed with confidence or not. In experiments, a real robot arm indicates to the user which demonstrations should be provided to increase its repertoire of reaching skills. Experiments will also show that the robot becomes confident in reaching objects for whose demonstrations were never provided, by incrementally learning from the neighboring demonstrations.
Robot imitation based on observations of the human movement is a challenging problem as the structure of the human demonstrator and the robot learner are usually different. A movement that can be demonstrated well by a human may not be kinematically feasible for robot reproduction. A common approach to solve this kinematic mapping is to retarget pre-defined corresponding parts of the human and the robot kinematic structure. When such a correspondence is not available, manual scaling of the movement amplitude and the positioning of the demonstration in relation to the reference frame of the robot may be required. This paper’s contribution is a method that eliminates both the need of human-robot structural associations—and therefore is less sensitive to the type of robot kinematics—and searches for the optimal location and adaptation of the human demonstration, such that the robot can accurately execute the optimized solution. The method defines a cost that quantifies the quality of the kinematic mapping and decreases it in conjunction with task-specific costs such as via-points and obstacles. We demonstrate the method experimentally where a real golf swing recorded via marker tracking is generalized to different speeds on the embodiment of a 7 degree-of-freedom (DoF) arm. In simulation, we compare solutions of robots with different kinematic structures
Learning motions from human demonstrations provides an intuitive way for non-expert users to teach tasks to robots. In particular, intelligent robotic co-workers should not only mimic human demonstrations but should also be able to adapt them to varying application scenarios. As such, robots must have the ability to generalize motions to different workspaces, e.g. to avoid obstacles not present during original demonstrations. Towards this goal, our work proposes a unified method to (1) generalize robot motions to different workspaces, using a novel formulation of trajectory optimization that explicitly incorporates human demonstrations, and (2) to locally adapt and reuse the optimized solution in the form of a distribution of trajectories. This optimized distribution can be used, online, to quickly satisfy the via-points and goals of a specific task. We validate the method using a 7 degrees of freedom (DoF) lightweight arm that grasps and places a ball into different boxes while avoiding obstacles that were not present during the original human demonstrations.
This paper proposes a method to achieve fast and fluid human-robot interaction by estimating the progress of the movement of the human. The method allows the progress, also referred to as the phase of the movement, to be estimated even when observations of the human are partial and occluded; a problem typically found when using motion capture systems in cluttered environments. By leveraging on the framework of Interaction Probabilistic Movement Primitives, phase estimation makes it possible to classify the human action, and to generate a corresponding robot trajectory before the human finishes his/her movement. The method is therefore suited for semi-autonomous robots acting as assistants and coworkers. Since observations may be sparse, our method is based on computing the probability of different phase candidates to find the phase that best aligns the Interaction Probabilistic Movement Primitives with the current observations. The method is fundamentally different from approaches based on Dynamic Time Warping that must rely on a consistent stream of measurements at runtime. The resulting framework can achieve phase estimation, action recognition and robot trajectory coordination using a single probabilistic representation. We evaluated the method using a seven-degree-of-freedom lightweight robot arm equipped with a five-finger hand in single and multi-task collaborative experiments. We compare the accuracy achieved by phase estimation with our previous method based on dynamic time warping.
Grasping is an essential component for robotic manipulation and has been investigated for decades. Prior work on grasping often assumes that a sufficient amount of training data is available for learning and planning robotic grasps. However, constructing such an exhaustive training dataset is very challenging in practice, and it is desirable that a robotic system can autonomously learn and improves its grasping strategy. Although recent work has presented autonomous data collection through trial and error, such methods are often limited to a single grasp type, e.g., vertical pinch grasp. To address these issues, we present a hierarchical policy search approach for learning multiple grasping strategies. To leverage human knowledge, multiple grasping strategies are initialized with human demonstrations. In addition, a database of grasping motions and point clouds of objects is also autonomously built upon a set of grasps given by a user. The problem of selecting the grasp location and grasp policy is formulated as a bandit problem in our framework. We applied our reinforcement learning to grasping both rigid and deformable objects. The experimental results show that our framework autonomously learns and improves its performance through trial and error and can grasp previously unseen objects with a high accuracy.
This work is supported by H2020 RoMaNS (Robotic Manipulation for Nuclear Sort and Segregation) http://www.h2020romans.eu/
Trajectory optimization is an essential tool for motion planning under multiple constraints of robotic manipulators. Optimization-based methods can explicitly optimize a trajectory by leveraging prior knowledge of the system and have been used in various applications such as collision avoidance. However, these methods often require a hand-coded cost function in order to achieve the desired behavior. Specifying such cost function for a complex desired behavior, e.g., disentangling a rope, is a nontrivial task that is often even infeasible. Learning from demonstration (LfD) methods offer an alternative way to program robot motion. LfD methods are less dependent on analytical models and instead learn the behavior of experts implicitly from the demonstrated trajectories. However, the problem of adapting the demonstrations to new situations, e.g., avoiding newly introduced obstacles, has not been fully investigated in the literature. In this letter, we present a motion planning framework that combines the advantages of optimization-based and demonstration-based methods. We learn a distribution of trajectories demonstrated by human experts and use it to guide the trajectory optimization process. The resulting trajectory maintains the demonstrated behaviors, which are essential to performing the task successfully while adapting the trajectory to avoid obstacles. In simulated experiments and with a real robotic system, we verify that our approach optimizes the trajectory to avoid obstacles and encodes the demonstrated behavior in the resulting trajectory.
This paper introduces our initial investigation on the problem of providing a semi-autonomous robot collaborator with anticipative capabilities to predict human actions. Anticipative robot behavior is a desired characteristic of robot collaborators that lead to fluid, proactive interactions. We are particularly interested in improving reactive methods that rely on human action recognition to activate the corresponding robot action. Action recognition invariably causes delay in the robot’s response, and the goal of our method is to eliminate this delay by predicting the next human action. Prediction is achieved by using a lookup table containing variations of assembly sequences, previously demonstrated by different users. The method uses the nearest neighbor sequence in the table that matches the actual sequence of human actions. At the movement level, our method uses a probabilistic representation of interaction primitives to generate robot trajectories. The method is demonstrated using a 7 degree-offreedom lightweight arm equipped with a 5-finger hand on an assembly task consisting of 17 steps.
In this video, the robot tries to predict the next human action. The robot preemptively moves its hand towards the closest object that it "thinks" the human will need. And then wait until the human moves to confirm his/her action. This basically requires a plan as a sequence of actions. This sequence is given by a lookup table containing previous demonstrations of the assembly. This is the result of joint work between IAS-TU Darmstadt and IIT Madras.
Robot imitation based on observations of the human movement is a challenging problem as the structure of the human demonstrator and the robot learner are usually different. A movement that can be demonstrated well by a human may not be kinematically feasible for robot reproduction. A common approach to solve this kinematic mapping is to retarget predefined corresponding parts of the human and the robot kinematic structure. When such a correspondence is not available, manual scaling of the movement amplitude and the positioning of the demonstration in relation to the reference frame of the robot may be required. This letter's contribution is a method that eliminates both the need of human-robot structural associations and therefore is less sensitive to the type of robot kinematics and searches for the optimal location and adaptation of the human demonstration, such that the robot can accurately execute the optimized solution. The method defines a cost that quantifies the quality of the kinematic mapping and decreases it in conjunction with task-specific costs such as via-points and obstacles. We demonstrate the method experimentally where a real golf swing recorded via marker tracking is generalized to different speeds on the embodiment of a 7 degree-of-freedom (DoF) arm. In simulation, we compare solutions of robots with different kinematic structures.
Remote control of robots is often necessary to complete complex unstructured tasks in environments that are inaccessible (e.g. dangerous) for humans. Tele-operation of humanoid robots is often performed through motion tracking to reduce the complexity deriving from manually controlling a high number of DOF. However, most commercial motion tracking apparatus are expensive and often uncomfortable. Moreover, a limitation of this approach is the need to maintain visual contact with the operated robot, or to employ a second human operator to independently maneuver a camera. As a result, even performing simple tasks heavily depends on the skill and synchronization of the two operators. To alleviate this problem we propose to use augmented reality to provide the operator with first-person vision and a natural interface to directly control the camera and at the same time the robot. By integrating recent off-the-shelf technologies, we provide an affordable and intuitive environment composed of Microsoft Kinect, Oculus Rift and haptic SensorGlove to tele-operate in first-person humanoid robots. We demonstrate on the humanoid robot iCub that this set-up allows to quickly and naturally accomplish complex tasks.
Sensor gloves are popular input devices for a large variety of applications including health monitoring, control of music instruments, learning sign language, dexterous computer interfaces, and teleoperation robot hands. Many commercial products, as well as low-cost open-source projects, have been developed. We discuss here how low-cost (approx. 250 EUROs) sensor gloves with force feedback can be built, provide an open-source software interface for Matlab, and present first results in learning object manipulation skills through imitation learning on the humanoid robot iCub.
The movie shows how the humanoid robot ICub can be trained to stack cups. A probabilistic trajectory model of the behavior is learned from two human demonstrations (the 2nd demo runs at double speed). After learning, the robot can reproduce the cup stacking motions.
Learning motor skills from multiple demonstrations presents a number of challenges. One of those challenges is the occurrence of occlusions and lack of sensor coverage, which may corrupt part of the recorded data. Another issue is the variability in speed of execution of the demonstrations, which may require a way of finding the correspondence between the time steps of the different demonstrations. In this paper, an approach to learn motor skills is proposed that accounts both for spatial and temporal variability of movements. This approach, based on an Expectation-Maximization algorithm to learn Probabilistic Movement Primitives, also allows for learning motor skills from partially observed demonstrations, which may result from occlusion or lack of sensor coverage. An application of the algorithm proposed in this work lies in the field of Human-Robot Interaction when the robot has to react to human movements executed at different speeds. Experiments in which a robotic arm receives a cup handed over by a human illustrate this application. The capabilities of the algorithm in learning and predicting movements are also evaluated in experiments using a data set of letters and a data set of golf putting movements.
In this video, the robot uses the estimated phase to react with a corresponding speed such that the interaction looks more natural. This work allows our framework on Interaction Probabilistic Movement Primitives to not only infer the position at which the human will bring the cup but also at which speed the human is approaching so that the robot should react faster or slower.
This paper proposes an interaction learning method suited for semi-autonomous robots that work with or assist a human partner. The method aims at generating a collaborative trajectory of the robot as a function of the current action of the human. The trajectory generation is based on action recognition and prediction of the human movement given intermittent observations of his/her positions under unknown speeds of execution; a problem typically found when using motion capture systems in occluded scenarios. Of particular interest, the ability to predict the human movement while observing the initial part of the trajectory, allows for faster robot reactions. The method is based on probabilistically modelling the coupling between human-robot movement primitives and eliminates the need of time-alignment of the training data while being scalable in relation to the number of tasks. We evaluated the method using a 7-DoF lightweight robot arm equipped with a 5-finger hand in a multi-task collaborative assembly experiment, also comparing results with our previous method based on time-aligned trajectories.
Movement primitives (MPs) provide a powerful framework for data driven movement generation that has been successfully applied for learning from demonstrations and robot reinforcement learning. In robotics we often want to solve a multitude of different, but related tasks. As the parameters of the primitives are typically high dimensional, a common practice for the generalization of movement primitives to new tasks is to adapt only a small set of control variables, also called meta parameters, of the primitive. Yet, for most MP representations, the encoding of these control variables is pre-coded in the representation and can not be adapted to the considered tasks. In this paper, we want to learn the encoding of task-specific control variables also from data instead of relying on fixed meta-parameter representations. We use hierarchical Bayesian models (HBMs) to estimate a low dimensional latent variable model for probabilistic movement primitives (ProMPs), which is a recent movement primitive representation. We show on two real robot datasets that ProMPs based on HBMs outperform standard ProMPs in terms of generalization and learning from a small amount of data and also allows for an intuitive analysis of the movement. We also extend our HBM by a mixture model, such that we can model different movement types in the same dataset.
The movie shows an example of target-reaching movements on a KUKA robot arm generated with a probabilistic movement representation. In the representation, latent variables are extracted from human demonstrations that can be used to generate new movements to unseen tasks, as well as to model different types of movements.
This paper proposes an interaction learning method for collaborative and assistive robots based on movement primitives. The method allows for both action recognition and human-robot movement coordination. It uses imitation learning to construct a mixture model of human-robot interaction primitives. This probabilistic model allows the assistive trajectory of the robot to be inferred from human observations. The method is scalable in relation to the number of tasks and can learn nonlinear correlations between the trajectories that describe the human-robot interaction. We evaluated the method experimentally with a lightweight.
This video shows a 7DoF compliant arm being used as an assistive robot. The algorithm is based on Interaction Probabilistic Movement Primitives.
Robots that interact with humans must learn to not only adapt to different human partners but also to new interactions. Such a form of learning can be achieved by demonstrations and imitation. A recently introduced method to learn interactions from demonstrations is the framework of Interaction Primitives. While this framework is limited to represent and generalize a single interaction pattern, in practice, interactions between a human and a robot can consist of many different patterns. To overcome this limitation this paper proposes a Mixture of Interaction Primitives to learn multiple interaction patterns from unlabeled demonstrations. Specifically the proposed method uses Gaussian Mixture Models of Interaction Primitives to model nonlinear correlations between the movements of the different agents. We validate our algorithm with two experiments involving interactive tasks between a human and a lightweight robotic arm. In the first, we compare our proposed method with conventional Interaction Primitives in a toy problem scenario where the robot and the human are not linearly correlated. In the second, we present a proof-of-concept experiment where the robot assists a human in assembling a box
This paper proposes a probabilistic framework based on movement primitives for robots that work in collaboration with a human coworker. Since the human coworker can execute a variety of unforeseen tasks a requirement of our system is that the robot assistant must be able to adapt and learn new skills on-demand, without the need of an expert programmer. Thus, this paper leverages on the framework of imitation learning and its application to human-robot interaction using the concept of Interaction Primitives (IPs). We introduce the use of Probabilistic Movement Primitives (ProMPs) to devise an interaction method that both recognizes the action of a human and generates the appropriate movement primitive of the robot assistant. We evaluate our method on experiments using a lightweight arm interacting with a human partner and also using motion capture trajectories of two humans assembling a box. The advantages of ProMPs in relation to the original formulation for interaction are exposed and compared.
Contacts between objects play an important role in manipulation tasks. Depending on the locations of contacts, different manipulations or interactions can be performed with the object. By observing the contacts between two objects, a robot can learn to detect potential interactions between them.
Rather than defining a set of features for modeling the contact distributions, we propose a kernel-based approach. The contact points are first modeled using a Gaussian distribution. The similarity between these distributions is computed using a kernel function. The contact distributions are then classified using kernel logistic regression. The proposed approach was used to predict stable grasps of an elongated object, as well as to construct towers out of assorted toy blocks.
Creating robots that can act autonomously in dynamic unstructured environments requires dealing with novel objects. Thus, an offline learning phase is not sufficient for recognizing and manipulating such objects. Rather, an autonomous robot needs to acquire knowledge through its own interaction with its environment, without using heuristics encoding human insights about the domain. Interaction also allows information that is not present in static images of a scene to be elicited. Out of a potentially large set of possible interactions, a robot must select actions that are expected to have the most informative outcomes to learn efficiently. In the proposed bottom-up probabilistic approach, the robot achieves this goal by quantifying the expected informativeness of its own actions in information-theoretic terms. We use this approach to segment a scene into its constituent objects. We retain a probability distribution over segmentations. We show that this approach is robust in the presence of noise and uncertainty in real-world experiments. Evaluations show that the proposed information-theoretic approach allows a robot to efficiently determine the composite structure of its environment. We also show that our probabilistic model allows straightforward integration of multiple modalities, such as movement data and static scene features. Learned static scene features allow for experience from similar environments to speed up learning for new scenes.
One of the key challenges in robotic bipedal locomotion is finding gait parameters that optimize a desired performance criterion, such as speed, robustness or energy efficiency. Typically, gait optimization requires extensive robot experiments and specific expert knowledge. We propose to apply data-driven machine learning to automate and speed up the process of gait optimization. In particular, we use Bayesian optimization to efficiently find gait parameters that optimize the desired performance metric. As a proof of concept we demonstrate that Bayesian optimization is near-optimal in a classical stochastic optimal control framework. Moreover, we validate our approach to Bayesian gait optimization on a low-cost and fragile real bipedal walker and show that good walking gaits can be efficiently found by Bayesian optimization.
Movement primitives (MPs) provide a powerful framework for data-driven movement generation that has been successfully applied for learning from demonstrations and robot reinforcement learning. In robotics, we often want to solve a multitude of different, but related tasks. As the parameters of the primitives are typically high dimensional, a common practice for the generalization of movement primitives to new tasks is to adopt only a small set of control variables, also called meta parameters, of the primitive. Yet, for most MP representations, the encoding of these control variables is pre-coded in the representation and can not be adapted to the considered tasks. In this paper, we want to learn the encoding of task-specific control variables also from data instead of relying on fixed meta-parameter representations. We use hierarchical Bayesian models (HBMs) to estimate a low dimensional latent variable model for probabilistic movement primitives (ProMPs), which is a recent movement primitive representation. We show on two real robot datasets that ProMPs based on HBMs outperform standard ProMPs in terms of generalization and learning from a small amount of data and also allows for intuitive analysis of the movement. We also extend our HBM by a mixture model, such that we can model different movement types in the same dataset.
The movie shows an example of target-reaching movements on a KUKA robot arm generated with a probabilistic movement representation. In the representation, latent variables are extracted from human demonstrations that can be used to generate new movements to unseen tasks, as well as to model different types of movements.