Creating autonomous robots that can learn to assist humans in situations of daily life is a fascinating challenge for machine learning. While this aim has been a long-standing vision of artificial intelligence and the cognitive sciences, we have yet to achieve the first step of creating robots that can learn to accomplish many different tasks triggered by environmental context or higher-level instruction. The goal of our robot learning laboratory is the investigation of the ingredients for such a general approach to motor skill learning, to get closer towards human-like performance in robotics. We thus focus on the solution of basic problems in robotics while developing domain-appropriate machine-learning methods. Starting from theoretically well-founded approaches to representing the required control structures for task representation and execution, we replace the analytically derived modules by more flexible, learned ones. Our research builds on, extends and connects different disciplines, in particular the fields of Robotics and Control, Machine Learning, and Cognitive Science.
Research in robotics and artificial intelligence has lead to the development of complex robots such as humanoids and androids. In order to be meaningfully applied in human-inhabited environments, robots need to possess a variety of physical abilities and skills. However, programming such skills is a labor- and time intensive task which requires a large amount of expert knowledge. In order to make robot programming amenable to non-experts we devise new methods for imitation learning and investigate new ways for representing robot movements.
To overcome such difficulties, we use imitation learning to teach robots new motor skills. A human demonstrator first provides one or several examples of the skill. Information recorded through motion capture or physical interaction is used by the robot to automatically generate a controller that can replicate the seen movements. This is done using modern machine learning techniques. Imitation learning also allows robots to improve the observed behavior. This so-called self-improvement of the task can help the robot to adapt the learned movement to the characteristics of its own body or the requirements of the current context. Hence, even if the examples presented by the human are not optimal, the robot can still use them to bootstrap its behavior.
At IAS, imitation learning has already been used to teach complex motor skills to various kinds of robots. This includes skills such as locomotion, grasping of novel objects , ping-pong, ball-in-the-cup and tetherball. New machine learning methods that reduce the time needed to acquire a motor skill are developed. The goal of this research is to have intelligent robots that can autonomously enlarge their repertoire of skills by observing or interacting with human teachers.
Movement Primitives are movement representations that can be combined to generate different or more complex behaviors. In general, they present generalization capabilities, such as changing the speed of execution, changing start position, goal position and via points.
We developed a novel movement primitive representation, capable for imitation learning. The new approach is called Probabilistic Movement Primitives (ProMPs) and allows for blending between motions, adapting to altered task variables, and co-activating multiple MPs in parallel. We based our approach on a probabilistic formulation of the movement primitive concept that maintains a distribution over trajectories. Our probabilistic approach allows for the derivation of new operations which are essential for implementing all aforementioned properties in one framework. In order to use such a trajectory distribution for robot movement control, we analytically derive a stochastic feedback controller which reproduces the given trajectory distribution. We have evaluated and compared our approach to existing methods on several simulated as well as real robot scenarios.
An essential problem in robotics is the accurate execution of desired movements using only low-gain control such that the robot will accomplish the desired task while not harming human beings in its environment. Following a trajectory with little feedback requires the accurate prediction of the needed torques, which cannot be achieved using classical methods for sufficiently complex robots. However, learning such models is hard as the joint-space can never be fully explored and the learning algorithm has to cope with a never-ending data stream in real time. We have developed learning methods both for accomplishing tasks represented in operational space as well as in joint-space. For more information on learning to execute see:
While learning to execute tasks is a component essential to a framework for motor skill learning, learning the actual task is of even higher importance as discussed in here. We focus on the learning of elementary tasks or movement primitives, which are parameterized task representations based on nonlinear differential equations with desired attractor properties. We mimic how children learn new motor tasks using imitation learning for initializing these movement primitives while employing reinforcement learning to subsequently improve the task performance. We have learned tasks such as Ball-in-a-Cup or bouncing a ball on a string using this approach. In hitting and batting tasks, movement templates with a learned global shape need to be adapted during the execution so that the racket reaches a target position and velocity that will return the ball over to the other side of the net or court. This requires a reformulation of motor primitives to hitting primitives. A key motor skill for manipulating the environment is grasping, which is why one of our research goals is adapting machine learning algorithms to make them applicable in the robot grasping task domain. For grasping or hitting, several alternative motor primitives might be available.
Most complex tasks require several motor primitives to be executed in parallel or in sequence. The selection and composition of motor primitives requires a perceptuo-motor perspective and is necessary for learning complex tasks. An example of a complex task which requires motor primitive selection and hitting primitives is the task of learning to play ping-pong. Moving towards learning complex tasks requires the solution of a variety of hard problems. Among these are the decomposition of large tasks into movement primitives (MP), the acquisition and self-improvement of MPs, the determination of the number of MPs in a data set, the determination of the relevant task-space, perceptual context estimation and goal learning for MPs, as well as the composition of MPs for new complex tasks. These questions are tackled in order to make progress towards fast and general motor skill learning for robotics.
Selected Related Research Topics: learning models for control, learning operational space control, reinforcement learning, learning motor primitives, learning complex tasks, learning to grasp, learning to play ping-pong, motor primitives for hitting, brain-robot interfaces
We use machine learning in order to enable robots to learn complex skills from data. Most importantly, we focus on reinforcement learning in order to directly learn the desired behavior from experience. However, we also investigate other sub-fields of machine learning in order to meet the particular requirements for robot skill learning. It is, for example, usually very costly to collect data on a robot and it is therefore important to make good use of limited amounts of data. This scarcity of data requires high sample efficiency not only for reinforcement learning, but often also during optimization or probabilistic inference.
Supervised learning is not always sufficient for motor learning problems, partly because often an expert teacher or idealized version of the behavior is not available. Because of that, one of our goals is the development reinforcement learning methods which scale into the dimensionality of humanoid robots and can generate actions for seven or more degrees of freedom.
Efficient reinforcement learning for continuous states and actions is essential for robotics and control. We follow two approaches depending on the dimensionality of the domain. For high-dimensional state and action spaces, it is often easier to directly learn policies without estimating accurate system models. The resulting algorithms are parametric policy search algorithms inspired by expectation-maximization methods or information-theoretic policy updates and can be employed for motor primitive learning. Information-theoretic policy updates seem to be particularly suited for the high-dimensional problems that occur in robotics as the update rules result in a smooth and robust policy update that always `stays close to the data'. If we can learn a model of the robot and its environment, we can employ model-based reinforcement learning to drastically improve the sampling efficiency of policy search algorithms. As a result, these methods can learn good policies at a rapid pace based on only little interaction of the system.
Our general goal in reinforcement learning is the development of methods which scale into the dimensionality of humanoid robots. Such high dimensionalities is a tremendous challenge for reinforcement learning as a complete exploration of the underlying state-action spaces is impossible and few existing techniques scale into this domain. Therefore we rely upon a combination of both, watching a teacher and subsequent self-improvement. In more technical terms: first, a control policy is obtained by imitation and then improved using reinforcement learning.
Parametrized policies allow an efficient abstraction of the high-dimensional continuous action spaces which is often needed in robotics. We can directly optimize the parameters of the primitive by the use of policy search methods. Members of IAS have developed a variety of novel algorithms for this context which have been applied for learning to play table tennis, tetherball, the game 'ball in the cup' and darts.
The NAC is currently considered the most efficient policy gradient method. It makes use of the fact, that a natural gradient usually beats a vanilla gradient. For more information read:
We formulated policy search as an inference problem. This has led to efficient algorithms like reward-weighted regression and PoWER. For more information read:
The optimization in policy search can rapidly change the control policy, which might lead to jumps in the policy as well as in the resulting trajectory distribution. While this behavior might already be dangerous for a real robot, it might also lead to premature convergence to suboptimal solutions or even oscillations. Information-theoretic policy search solves this problem by bounding the Relative Entropy between two subsequent policies. Hence, the new policy always tries to stay close to the `data' that has been generated by the old policy, while maximizing the reward locally. Such policy updates are mathematically sound and allow the derivation of a whole range of new algorithms, including contextual policy search and learning hierarchical policies. For more information read:
For an overview of these approaches, please also consult the following survey papers from IAS members:
Learning new control policies on real robot systems is generically time-intensive and may lead to catastrophic robot failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfortunately, running policy search algorithms on a slightly faulty simulator can easily lead to the exploitation of modeling errors, such that the resulting behavior can potentially damage the robot. For this reason, much work in robot reinforcement learning has focused on model-free methods that learn on real-world systems. This lack of using simulators, unfortunately, results in an extensive need for robot time during training and imposes severe constraints on the applicable methods.
The aim of our research is to fundamentally speed-up robot learning by making the control policies resulting from the physics simulations directly deployable in the real world. To do so, we explore how policy optimization can be made more robust by perturbing the simulator’s parameters and training from ensembles of models instead of just one nominal model.
Many real-world applications are characterized by multiple conflicting objectives. In such problems optimality is replaced by Pareto optimality and the goal is to find the Pareto frontier, a set of solutions representing different compromises among the objectives. Despite recent advances in multi-objective optimization, achieving an accurate representation of the Pareto frontier is still an important challenge.
We formulate the problem of approximating Pareto frontiers as MOMDPs and solve it with a manifold-based approach. We combine episodic exploration strategies and importance sampling to efficiently learn a manifold in the policy parameter space such that its image in the objective space accurately approximates the Pareto frontier.
Intrinsic motivation defines a set of objectives the robot can pursue even in the absence of a task related reward. We are especially interested in combining intrinsic motivation and task-specific objectives. For instance, a robot can decide to maximize the entropy of its sensory-motor stream, resulting in an exploratory behavior. When combined with a task reward, this can ensure that exploration of diverse behaviors is restricted to the area of interest of the task.
Learning a diverse set of good performing behaviors can have several advantages. In an adversarial setting such as robot table tennis, diversity of behaviors renders the robot harder to predict and hence harder to counter for the opponent. In a collaborative task with a human, an emphasis on diversity gives the human more opportunity to guide the robot and to avoid the latter to be stuck in local optima.
Many machine learning techniques that we use for robot skill learning are based on probabilistic models that are often complex and thus not amenable for exact statistical inference. Hence, we often need to rely on approximate inference, e.g. by using Monte Carlo estimates. However, standard techniques for approximate inference, such as Markov-chain Monte Carlo, may not provide sufficient sample efficiency for robot applications. By learning a tractable approximation of the complex probabilistic model Variational Inference frames statistical inference as an optimization problem and allows us to make better use of the available data.
Interestingly, the problem of learning tractable approximations of probability distributions is very similar to the problem of Reinforcement Learning. This connection even goes so far that some Policy Search methods essentially solve the same optimization problem as some Variational Inference methods. Building on our expertise in Reinforcement Learning, we devised a novel method for Variational Inference that is capable of learning accurate Gaussian Mixture Model approximations of complex probability distributions. Our method is based on a lower bound on the objective function that enables us to decompose the optimization problem into simpler sub-problem while an Expectation-Maximization procedure ensures that we converge to an optimum of the true objective. We leverage insights from Reinforcement Learning by establishing information-theoretic trust region while optimizing the sub-problems. These trust-regions are crucial for exploration and stability, enabling us to perform approximate inference with similar quality to Markov-chain Monte Carlo while requiring several orders of magnitudes less function evaluations.
Optimizing parameters in robotics is a necessary but often hard task. The challenges arise from the presence of multiple local minima, noise in the measurements, lack of analytical gradients and the limited number of experiments possible on a real robot. Hence, typical approaches such as grid search, random search, gradient descent and genetic algorithms often perform poorly.
Bayesian optimization is designed to naturally deal with these challenges of optimization in robotics. Thanks to the use of a response surface model, Bayesian optimization drastically reduce the number of experiments needed. Hence, it can be successfully employed in a real system where other approaches would fail.
The global nature of classical Bayesian optimization algorithms makes them ill-suited for high dimensional robotic tasks such as tuning a movement primitive's forcing function. Not only can the search for a global optimizer be sample inefficient, but evaluating parameters far from a known safe solution (such as provided by a human teacher) can be dangerous for the robot. For this purpose, we propose to retain the key principles of Bayesian optimization (i.e. a probabilistic modeling of the response surface and the use thereof in resolving the exploration-exploitation dilemma) and apply them in a local context. The resulting local Bayesian optimization algorithm is both sample efficient and safe by restraining the drift to initial human data.
In order to ensure the practicability of our devised methods for learning robot control it is inevitable to evaluate them on real systems and to compare them against more traditional engineered solutions. For this purpose, we regularly make use of our robotic laboratory that includes a variety of advanced robots. For safe human-robot interactions we operate several compliant robot platforms including two bimanual manipulation platforms, based on two Kuka LBR 4 and two Kuka IIWA R820 respectively, as well as a tetherball setup based on two BioRob robotic arms. For tasks that require fast and precise movements (such as table tennis and ball catching) we can use our Barrett WAM. Furthermore, we have several robotic hands for dexterous manipulation, mobile platforms (including the humanoid iCup) for locomotion and one hundred kilobots for swarm robotics.
A key long-term goal of robotics is to create autonomous robots that can perform a wide range of tasks to help humans in daily life. One of the main requirements of such domestic and service robots is the ability to manipulate a wide range of objects in their surroundings.
Modern industrial robots usually only have to manipulate a single set of identical objects, using preprogrammed actions. However, future service robots will need to operate in unstructured environments and perform tasks with novel objects. These robots will, therefore, need to learn to optimize their actions to specific objects, as well as generalize their actions between objects.
The ability to grasp objects is an important prerequisite for performing various manipulation tasks. Using trial-and-error, a robot can autonomously optimize its grasps of objects. In particular, the grasp selection process can be framed as a continuum-armed bandit reinforcement learning problem. Thus, the robot can actively balance executing grasps that are known to be good and exploring new grasps that may be better.
An object's affordances are the actions that the robot can perform using the object. The affordances of basic objects, such as tools, are usually defined by their surface structures. By finding similar surface structures in different objects, the robot can transfer its knowledge of afforded actions between objects. In this manner, the robot can predict whether a novel object affords a specific action, as well as adapt the action to this object.
The robot can learn an initial action from a human demonstration. Adapting this action to new objects is achieved autonomously by the robot, using a trial-and-error learning approach.
Manipulation tasks can be decomposed into phases, wherein the robot's actions have distinct effects depending on the current phase. In order to perform a task, the robot will first need to reach a phase that affords the desired manipulation, which may require transitioning through other phases first. The phases thus define a sequence of subtasks for the robot to complete in order to manipulate different parts of the environment.
Our research has focused on developing methods for learning the phase structure of tasks, as well as learning manipulation skills for transitioning between the phases. The robot first learns the conditions for transitioning between phases and then optimizes its motor skills accordingly. In order to learn versatile manipulation skills, we have also investigated representations for generalizing phase transitions between different objects. As part of our research, the Darias robot has learned the phases of a variety of tasks, including stacking objects, pouring, two-handed grasping, and turning a pepper mill.
Many current robots lack fine manipulation skills. A major reason for this is that most industrial robotic arm-hand systems do not receive sufficient feedback about the contact with the object. Neuroscience has shown that such feedback from tactile sensors is a critical component in the human ability to perform such tasks. Therefore, we aim to equip our robots with tactile sensors and use the sensor feedback for robot control. Research issues include how to interpret the signals from the sensors (that are often noisy and high-dimensional) and how to include such signals in a robot control loop. Among others, we aim to address such issues by using supervised, unsupervised, and reinforcement learning techniques.
Machine learning and artificial intelligence have made important and substantial progress in recent years. By now, they are reaching large-scale applications in industrial environments. Such machine learning methods increasingly enable unforeseen development of new applications of technical systems and robots. Future robots will better recognize and understand their surrounding environment. Perception, however, is just the first step as the robot must derive meaningful actions from such perception. For complex tasks, it will not be possible to manually program all the handling rules which are required for the robot to succeed. This problem is particularly severe when the robot directly interacts with humans. For such tasks, learning robots are a promising alternative. The robot receives feedback from human beings in its environment and performance, e.g., in form of grades rating its behavior as positive or negative. The robot should subsequently adapt its behavior accordingly. Methods for learning through interaction are best addressed in the framework of ``Reinforcement Learning,'' a particularly hot sub-field of machine learning. Instead of being manually programmed to carry out the same task millions of times, future robots would be enabled to adapt to hundreds of different tasks autonomously and could become useful for small series production. The proposed research will develop reinforcement learning methods for more complex robot activities enabling as well as interaction with humans.
Due to the inherent stochasticity and diversity of human behavior, robots need learning and adaptation capabilities to successfully interact with humans in various scenarios. We have developed Machine Learning algorithms that enable robots to learn interactions from demonstrations. Those algorithms cope with the variability in space and time that is inherent in human behavior. Moreover, our group has investigated how robots can support the practice and execution of movements by humans. Applications of our methods have been demonstrated in assembly tasks involving humans and robots working in partnership, in scenarios where the human tries to learn a new motor skill such as writing characters, in teleoperation scenarios with shared autonomy between the human and the robot, among other situations.
The overlap between the workspaces of humans and robots has been increasing in the last decades. Situations in which people interact with robots are becoming more frequent. Such situations pose a number of research questions, such as:
Our research aims at determining efficient and effective interaction configurations in order to allow humans and robots to acquire new movements or improve their performance at tasks involving their sensorial and motor capabilities.
The search for configurations that allow humans and robots to jointly learn movements may result in a large number of applications: industrial robots may be retrained through the physical interaction with humans, instead of through the rewriting of lines of code; new devices and training programs for the rehabilitation of motor-impaired patients may be designed.
Semi-autonomous robots are robots that physically interact with a human partner in order to achieve a task in a collaborative manner. Fundamental research in semi-autonomous robotics has potential applications in a variety of scenarios where humans need assistance: assembly of products in factories, the aid of the elderly at home, control of actuated prosthetics, shared control in repetitive teleoperated processes.
Research for semi-autonomous robots needs to go beyond the usual sense-plan-act paradigm to account for the interaction with humans. Sensing and perception must involve the observation and interpretation of the human movement. Planning has to take into account the inferred human intent and, from a possibly limited repertoire of robot skills, search for an appropriate combination of actions. Acting for collaboration requires algorithms that model and predict human trajectories and generate safe and appropriate commands for both human and robot.
One of the challenges that we are currently addressing is related to the fact that humans usually execute a variety of unforeseen tasks; hence pre-programming a robot for all possible tasks is infeasible. We approach this problem by investigating new algorithms that allow a robot: (1) to learn and maintain a dynamic repertoire of skills and to assess its own ability to assist the human in accomplishing a given task, and (2) to make requests for human demonstration in order to acquire a new skill via imitation and reinforcement learning.
We also investigate ways to model collaborative interaction. Such a model is used to predict the intention and movement of the human while simultaneously generating collaborative control actions for the robot. Our current approach leverages on probabilistic tools for the realization of interaction primitives. Basically, a prior model of the interaction is encoded as a joint distribution of the human and robot movements, from which a posterior distribution of the interaction can be inferred by observing only the human.
We validate and demonstrate our algorithms using real robotic systems which also involves the milestones of the 3rd Hand Project (visit the website here, download the brochure here) as one of our driving applications. You can watch our latest results in here.
Dealing with whole body movement of humanoid robot is a highly challenging domain involving various different problems ranging from stabilizing behaviors (e.g., balancing, support oneself against objects) over static motions (e.g., getting up from a chair, push-ups) up to locomotion (e.g., walking, running). As of now, no robot exhibits the same dexterity, efficiency, speed and robustness as a human due to the difficulty of controlling and planning with a high number of degrees of freedom as well as due to the complexity of modeling and estimating physical contacts.
We aim to develop new approaches to improve the current state-of-the-art for whole body movement of humanoid robots in general, and, especially, for robot locomotion.
A robot needs to be aware of the properties of objects to efficiently perform tasks with them. Currently, most robots are provided with this object information by a programmer. However, autonomous robots working in service industries and domestic settings will need to perform tasks with novel objects. Hence, relying on predefined object knowledge will not be an option.
Instead, robots will have to learn about objects by exploring their properties through physical interactions, such as pushing, stroking, and lifting. As random exploration is an inefficient approach, we develop methods for efficiently gathering information.
The fundamental knowledge learned about objects and primitive actions will later on form the basis for learning complex behaviors and predicting the properties of novel objects. By discovering accurate representations of objects, the robot will be able to plan and execute manipulations more precisely.
The textures of object surfaces can be observed both by visual inspection and by sliding a dynamic tactile sensor across the surface. The robot can combine these types of sensor readings to determine which components of the data contain information pertaining to the texture.
In particular, the robot can find the components that are maximally correlated between the two sensor modalities. Given the relevant data components, the robot can create a compact representation of object textures, which allows it to distinguish between surfaces more accurately.
Learning about objects is not a passive perceptual process: its embodiment allows a robot to discover object properties by actively changing its point of view and by interacting with objects. At the same time, the robot observes the effect of its actions and learns how it can bring about desired effects.
To learn efficiently, the robot should select the exploratory actions that yield the highest information gain out of all possible actions. By efficiently exploring its environment, a robot develops object knowledge grounded in its sensorimotor experience.
The developed object knowledge can be used to manipulate previously unknown objects in unstructured environments. For example, the robot could teach itself to perform tasks in domestic environments.
Our research on robot skill learning is also connected with the study of the human mind. By increasing our understanding of the way humans learn to perform novel tasks we can discover novel ways for robot learning. Conversely, insights from robot learning may increase our understanding of the human mind. Furthermore, physically connecting the human brain with robots is an important field of research, because it can enable ALS patients and amputees to retrieve physical capabilities.
Life on earth dates back billions of years and evolved into astonishingly complex organisms by means of natural selection. By understanding how humans and animals are capable of learning how to manipulate their environment we can find inspiration for devising intelligent robot systems.
The human central nervous system has an incredible ability for learning new motor skills. For example, when learning to control a novel sport device, new movement and balancing strategies may be discovered within seconds. Such rapid learning processes are based on learned abstract concepts, current beliefs of the environment and future expectations of desired postures, reached targets or achieved rewards (among others). Little is known about how these underlying processes develop and how they interplay during motor learning.
In our research, we develop probabilistic models and inference techniques to gain a better understanding of the amazing human learning abilities. Our models reproduce characteristic features like motor variability, continuous exploration, stochastic decisions and the ability to learn task abstractions. Applications range from medical diagnoses and rehabilitation research to smart robot learning and control frameworks.
Spiking neural networks are powerful computational models of brain functions and are a key technology in, e.g., the Human Brain Project with estimated total costs of 1.19 billion Euros. A second driving force for intensive research efforts are the massive parallel computing abilities. Companies like IBM, SPS or HP develop 'brain-like' memory and computing chips with the long-term goal of developing neuromorphic computers.
We investigate how spiking neural network models for neuromorphic chips can be used for robot motor control and learning. A strong focus is put on the real-time processing of huge sensory streams from tactile, visual and other sensors and on learning from rewards. The implemented computational principles are based on powerful machine learning algorithms like probabilistic inference, contrastive divergence or stochastic policy search that are used in a large variety of other applications like visual scene understanding, speech processing and cognitive reasoning.
Insights from control engineering and statistics improve our understanding of how humans work and vice versa. The notions of feedback and information became deeply ingrained in social sciences, following the pioneering work of Norbert Wiener on cybernetics. Creating models of human behaviour based on statistics and control theory that capture certain aspects of cognition and intelligence helps in answering fundamental questions about the subconscious processes taking place in a human being. How do we decide which muscles to activate to move an arm? What makes eyes move to track a moving target? Many such questions can be addressed by means of simplified mathematical models.
Brain-Computer Interfaces (BCI) can be used to control robots using one's thoughts which can be especially useful for amputees and ALS patients as it allows to control assistive robots or robot prostheses without requiring muscle control. However, classifying the intended actions from BCI sensor data is very challenging, because it is difficult to identify meaningful features of the high-dimensional sensor data. Furthermore, such features are different among patients and typically even vary across different days.