One of the key purposes of motor skills is to give an agent the ability to alter its surroundings. The most fundamental of these manipulation skills is that of grasping objects. Robots will need to be capable of grasping complex objects to complete most tasks that may be appointed to them in the future. Unfortunately, no robot endeffector to date has the wide range of applicability of the human hand. This gives rise to a dichotomy in how to progress on the problem of robot grasping, i.e. develop better hardware or advance the software's state-of-the-art. Our research focuses on the latter and particularily on adapting machine learning algorithms to make them applicable in the robot grasping task domain. Due to the close proximty a robot needs to its enviornment when manipulating objects, the robot must rely heavily on its senses to approximate the state of its surroundings, and learn to adapt its grasping skills accordingly. While robots can be augmented with a wide range of different sensors, limiting the system to a basic hand,arm, and stereovision makes the results applicable to a larger range of robots.
The general approach to learning to grasp an object is to initalilze the robot's knowledge using imitation learning, and then refining the knowledge by applying reinforcement learning. However, imitation learning is severly handicapped by the correspondance problem caused by the differences between the human teacher and the robot's hands. While a human can demonstrate general grasp locations on a given object, the burden of deciding details such as the exact finger placements and determining which grasps are appropriate for the robot lies mainly on the reinforcement learning.
Our first contribution to the field of robot grasping has been the development of an active learning system to allow the robot to find good grasps. The grasping task was approached as a continuum-armed bandit framework, inwhich the hand's pose in the object's reference frame represents the chosen action, and the successfulness of the grasp creates the corresponding reward. The value function was approximated using Gaussian process regression, and local value maxima were determined using a novel method based on Mean-Shift mode detection. Gibb's policy as well as an Upper-Confidence-Bound policy, which incorporates the standard deviation of the GPR, have been implemented to select the grasp that will be attempted. The UCB system has been experimently shown to be very effective at finding suitable grasp location.