While robotics has been around for at least five decades starting with Joseph Engelbergers and George Devil's first industrial manipulator and populate most factory floors all around the world, the state of the art in motor skills for robotics has changed little over the last decades. To date, most of the tasks accomplished by robots is the following of trajectories created manually and often controlled using linear controllers with high gains. Humanoid robots, on the other hand, will require a very different state of the art: motor skills have to be aquired automatically from a mixture of supervised and reinforcement learning, need to have re-usable basic building blocks such as motor primitives and will require complianty control in the respective task spaces. The different tasks need to be able to be ordered in a hierachical fashion.
One of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in our group, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we researching appropriate imitation and reinforcement learning methods. Our current setup consists out of imitation learning with locally weighted regression and subsequent reinforcement learning with policy gradient methods. We can show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm. These motor primitives serve as the building blocks of larger scale skill libraries.
We have discussed how single behaviours in a certain task-space can be learned using parameterized motor primitives and a discussion how to learn the execution of a motor task in its appropriate operational space can be found here. It is quite clear, that humans operate in multiple task spaces depending on the accomplished motor skill, e.g., in body-centric or retinal coordinates. However, it appears that the motor primitives used by humans for hand-writing or hand-zig-zagging are exactly the same the ones the same individual would use if using a toe (see e.g., Wing 2000). Thus, the motor primitive programs seem to be invariant under the motor command generation, and using the presented framework, such an invariance should also be easy to achieve by establishing skill libraries. Such skill libraries would contain both motor primitives as well as motor command transformations in order to make many combinations of the two possible. In such a case, a particular skill consists out of combination of the transformation of a motor command into the respective task-space and a primitive motor program which prescribes the behaviour. A motor skill is triggered from a perceptual system which selects the appropriate skill for the task. However, the learning of such skills will be largely using the methods presented in this thesis. It will use the observed movements in order to learn coordinate system to motor command transformations, even if the performed task was in a different coordinate system. Separately from the task primitive to motor command transformation, we will learn motor primitives. For this, observed tasks are compared to existing primitives. If the observed task is equivalent to an existing one, it will be used for refining the primitive while otherwise it will it will be added to the skill library. Subsequently, the skill library manager needs to decide whether to practice the skill using the reinforcement learning methods found here.
The selection of skills shifts the focus away from the pure motor control towards a perceptuo-motor perspective. In this case, a general task is given, e.g., grasp a specified object and pick it up, move through the room along a global trajectory, or hit a ball with a tennis racket. Here, perceptual variables allow us to choose the right motor primitives, e.g., whether to select a power grasp vs a precision pinch for a particular object, which foot trajectories to use for moving from one foothold to another in the presence of obstacles, or whether to select a tennis fore- vs backhand. Similarly, they need to be used in order to set the motor primitive goal parameters, e.g., the contact points where we intend to hold the object, the selected next foothold, or where to hit the ball at what time. Each of these tasks is associated with the appropriate effector. However, it is quite obvious that some of the tasks do transfer between end-effectors, e.g., we could use two fingers or two hands for generating a precision pinch for grasping and lifting a particular object. Clearly, the next higher system above the skill selection system needs some form of higher-level intelligence which determines the general task. This layer could close the gap between artificial intelligence systems and robotics.
Another key issue for research is the parallelization and sequencing of motor primitives. Such issues automatically arise in tasks of higher complexity, e.g., assembling a modular system such as an IKEA shelf. For such tasks, we require a sequence of tasks such as first several a peg-in-the-hole tasks (e.g., see Gullapalli et al., 1994) and subsequently dropping a shelf on top of the four pegs. It will also require holding two sides of the shelf in parallel so that they do not fall before assembled together. In order to learn such tasks, we require a hybrid control architecture consisting out of the lower level components such as motor primitives and task execution as well as a higher, discrete layer. The state of this discrete layer are the active primitives which together form a macro state or option. Such approaches will require a fusion of previous approaches to hybrid control approaches, hierarchical reinforcement learning and imitation learning similar as discussed my thesis proposal. Working towards such complex task compositions is of essential importance for the future of motor skills in robotics.