Workshop Proposal: Autonomously Learning Robots

Quick Facts

Organizers:	Gerhard Neumann, Joelle Pineau, Peter Auer, Marc Toussaint
Conference:	NIPS 2014
Location:	Montreal, Canada

Abstract:

To autonomously assist human beings, future robots have to autonomously learn a rich set of complex behaviors. So far, the role of machine learning in robotics has been limited to solve pre-specified sub-problems that occur in robotics and, in many cases, off-the-shelf machine learning methods. The approached problems are mostly homogeneous, e.g., learning a single type of movement is sufficient to solve the task, and do not reflect the complexities that are involved in solving real-world tasks.

In a real-world environment, learning is much more challenging than solving such homogeneous problems. The agent has to autonomously explore its environment and discover versatile behaviours that can be used to solve a multitude of different tasks throughout the future learning progress. It needs to determine when to reuse already known skills by adapting, sequencing or combining the learned behaviour and when to learn new behaviours. To do so, it needs to autonomously decompose complex real-world tasks into simpler sub-tasks such that the learned solutions for these sub-tasks can be re-used in a new situation. It needs to form internal representations of its environment, which is possibly containing a large variety of different objects or also different agents, such as other robots or humans. Such internal representations also need to shape the structure of the used policy and/or the used value function of the algorithm, which need to be flexible enough such to capture the huge variability of tasks that can be encountered in the real world. Due to the multitude of possible tasks, it also cannot rely on a manually tuned reward function for each task, and, hence, it needs to find a more general representations for the reward function. Yet, an autonomous robot is likely to interact with one or more human operators that are typically experts in a certain task, but not necessarily experts in robotics. Hence, an autonomously learning robot also should make effective use of feedback that can be acquired from a human operator. Typically, different types of instructions from the human are available, such as demonstrations and evaluative feedback in form of a continuous quality rating, a ranking between solutions or a set of preferences. In order to facilitate the learning problem, such additional human instructions should be used autonomously whenever available. Yet, the robot also needs to be able to reason about its competence to solve a task. If the robot thinks it has poor competence or the uncertainty of the competence is high, the robot should request more instructions from the human expert.

Most machine learning algorithms are missing these types of autonomy. They still rely on a large amount of engineering and fine-tuning from a human expert. The human typically needs to specify the representation of the reward-function, of the state, of the policy or of other internal representations used by the learning algorithms. Typically, the decomposition of complex tasks into sub-tasks is performed by the human expert and the parameters of such algorithms are fine tuned by hand. The algorithms typically learn from a pre-specified source of feedback and can not autonomously request more instructions such as demonstrations, evaluative feedback or corrective actions. We belief that this lack of autonomy is one of the key reasons why robot learning could not be scaled to more complex, real world tasks. Learning such tasks would require a huge amount of fine tuning which is very costly on real robot systems.

Goal:

In this workshop, we want to bring together people from the fields of robotics, reinforcement learning, active learning, representation learning and motor control. The goal in this multi-disciplinary workshop is to develop new ideas to increase the autonomy of current robot learning algorithms and to make their usage more practical for real world applications. In this context, among the questions which we intend to tackle are

More Autonomous Reinforcement Learning

How can we automatically tune hyper-parameters of reinforcement learning algorithms such as learning and exploration rates?
Can we find reinforcement learning algorithms that are less sensitive to the settings of their hyper-parameters and therefore, can be used for a multitude of tasks with the same parameter values?
How can we efficiently generalize learned skills to new situations?
Can we transfer the success of deep learning methods to robot learning?
How do learn on several levels of abstractions and also identify useful abstractions?
How can we identify useful elemental behaviours that can be used for a multitude of tasks?
How do use RL on the raw sensory input without a hand-coded representation of the state?
Can we learn forward models of the robot and its environment from high dimensional sensory data? How can these forward models be used effectively for model-based reinforcement learning?
Can we autonomously decide when to learn value functions and when to use direct policy search?

Autonomous Exploration and Active Learning

How can we autonomously explore the state space of the robot without the risk of breaking the robot?
Can we use strategies for intrinsic motivation, such as artificial curiosity or empowerment, to autonomously acquire a rich set of behaviours that can be re-used in the future learning progress?
How can we measure the competence of the agent as well as our certainty in this competence?
Can we use active learning to acquire improve the quality of learned forward models as well as to probe the environment to gain more information about the state of the environment?

Autonomous Learning from Instructions

Can we combine learning from demonstrations, inverse reinforcement learning and preference learning to make more effective use of human instructions?
How can we decide when to request new instructions from a human experts?
How can we scale inverse reinforcement learning and preference learning to high dimensional continuous spaces?
Can we use demonstrations and human preferences to identify relevant features from the high dimensional sensory input of the robot?

Autonomous Feature Extraction

Can we use feature extraction techniques such as deep learning to find a general purpose feature representation that can be used for a multitude of tasks.
Can recent advances for kernel based methods be scaled to reinforcement learning and policy search in high dimensional spaces?
What are good priors to simplify the feature extraction problem?
What are good features to represent the policy, the value function or the reward function? Can we find algorithms that extract features specialized for these representations?