Publication Details

You can download this complete bibtex reference list as all-ias-publications.bib.

Reference TypeJournal Article
Author(s)Lutter, M.; Silberbauer, J.; Watson, J.; Peters, J.
Yearsubmitted
TitleA Differentiable Newton-Euler Algorithm for Real-World Robotics
Journal/Conference/Book TitleSubmitted to the IEEE Transaction of Robotics (T-Ro)
Reference TypeJournal Article
Author(s)Dam, T.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Yearsubmitted
TitleA Unified Perspective on Value Backup and Exploration in Monte-Carlo Tree Search
Journal/Conference/Book TitleSubmitted to the Journal of Artificial Intelligence Research (JAIR)
Link to PDFhttps://arxiv.org/pdf/2202.07071
Reference TypeJournal Article
Author(s)Klink, P.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Yearsubmitted
TitleOn the Benefit of Optimal Transport for Curriculum Reinforcement Learning
Journal/Conference/Book TitleSubmitted to the IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI)
URL(s) https://arxiv.org/abs/2309.14091
Link to PDFhttps://arxiv.org/pdf/2309.14091.pdf
Reference TypeJournal Article
Author(s)Look, A.; Rakitsch, B.; Kandemir, M.; Peters, J.
Yearsubmitted
TitleSampling-Free Probabilistic Deep State-Space Models
Journal/Conference/Book TitleSubmitted to Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Link to PDFhttps://arxiv.org/pdf/2309.08256.pdf
Reference TypeJournal Article
Author(s)Klink, P.; Wolf, F.; Ploeger, K.; Peter, J.; Pajarinen, J.
Yearsubmitted
TitleTracking Control for a Spherical Pendulum via Curriculum Reinforcement Learning
Journal/Conference/Book TitleSubmitted to the IEEE Transactions on Robotics (T-Ro)
URL(s) https://arxiv.org/abs/2309.14096
Link to PDFhttps://arxiv.org/pdf/2309.14096.pdf
Reference TypeJournal Article
Author(s)Funk, N.; Helmut, E.; Chalvatzaki, G.; Calandra, R.; Peters, J.
Yearsubmitted
TitleEvetac: An Event-based Optical Tactile Sensor for Robotic Manipulation
Journal/Conference/Book TitleSubmitted to the IEEE Transactions on Robotics (T-Ro)
URL(s) https://sites.google.com/view/evetac
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/NiklasFunk/evetac_paper.pdf
Reference TypeJournal Article
Author(s)Prasad, V.; Heitlinger, L; Koert, D.; Stock-Homburg, R.; Peters, J.; Chalvatzaki, G.
Yearsubmitted
TitleLearning Multimodal Latent Dynamics for Human-Robot Interaction
Journal/Conference/Book TitleSubmitted to the IEEE Transaction of Robotics (T-RO)
Link to PDFhttps://arxiv.org/abs/2311.16380
Reference TypeJournal Article
Author(s)Prasad, V.; Kshirsagar, A; Koert, D.; Stock-Homburg, R.; Peters, J.; Chalvatzaki, G.
Yearsubmitted
TitleMoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations
Journal/Conference/Book TitleSubmitted to the IEEE Robotics and Automation Letters (RA-L)
Reference TypeJournal Article
Author(s)Weng, Y.; Matsuda, T.; Sekimoria, Y.; Pajarinen, J.; Peters, J.; Maki, T.
Yearin press
TitleEstablishment of Line-of-Sight Optical Links Between Autonomous Underwater Vehicles: Field Experiment and Performance Validation
Journal/Conference/Book TitleApplied Ocean Research
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Research/Overview/Establishment_of_Line_of_Sight_Optical_Links.pdf
Reference TypeJournal Article
Author(s)Abdulsamad, H.; Peters, J.
Yearin press
TitleModel-Based Reinforcement Learning via Stochastic Hybrid Models
Journal/Conference/Book TitleIEEE Open Journal of Control Systems, Special Section: Intersection of Machine Learning with Control
Link to PDFhttps://arxiv.org/pdf/2111.06211.pdf
Reference TypeJournal Article
Author(s)Flynn, H.; Reeb, D.; Kandemir, M.; Peters, J.
Yearin press
TitlePAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Link to PDFhttps://arxiv.org/pdf/2211.16110.pdf
Reference TypeJournal Article
Author(s)Abdulsamad, H.; Nickl, P.; Klink, P.; Peters, J.
Yearin press
TitleVariational Hierarchical Mixtures for Probabilistic Learning of Inverse Dynamics
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Link to PDFhttps://arxiv.org/pdf/2211.01120.pdf
Reference TypeJournal Article
Author(s)Kicki, P.; Liu, P.; Tateo, D.; Bou Ammar, H.; Walas, K.; Skrzypczynski, P.; Peters, J.
Year2024
TitleFast Kinodynamic Planning on the Constraint Manifold with Deep Neural Networks
Journal/Conference/Book TitleIEEE Transactions on Robotics (T-Ro), and Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Volume40
Pages277-297
Link to PDFhttps://arxiv.org/pdf/2301.04330.pdf
Reference TypeConference Paper
Author(s)Bhatt, A.; Palenicek, D.; Belousov, B.; Argus, M.; Amiranashvili, A.; Brox, T.; Peters, J.
Year2024
TitleCrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
AbstractSample efficiency is a crucial problem in deep reinforcement learning. Recent algorithms, such as REDQ and DroQ, found a way to improve the sample efficiency by increasing the update-to-data (UTD) ratio to 20 gradient update steps on the critic per environment sample. However, this comes at the expense of a greatly increased computational cost. To reduce this computational burden, we introduce CrossQ: a lightweight algorithm that makes careful use of Batch Normalization and removes target networks to surpass the state-of-the-art in sample efficiency while maintaining a low UTD ratio of 1. Notably, CrossQ does not rely on advanced bias-reduction schemes used in current methods. CrossQ's contributions are thus threefold: (1) state-of-the-art sample efficiency, (2) substantial reduction in computational cost compared to REDQ and DroQ, and (3) ease of implementation, requiring just a few lines of code on top of SAC.
VolumeSpotlight
Link to PDFhttps://openreview.net/pdf?id=PczQtTsTIX
Reference TypeConference Proceedings
Author(s)Derstroff, C.; Brugger, J.; Cerrato, M.; Peters, J.; Kramer, S.
Year2024
TitlePeer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations
Journal/Conference/Book TitleProceedings of the National Conference on Artificial Intelligence (AAAI)
Link to PDFhttps://arxiv.org/pdf/2312.09950.pdf
Reference TypeConference Proceedings
Author(s)Vincent, T.; Metelli, A.; Belousov, B.; Peters, J.; Restelli, M.; D'Eramo, C.
Year2024
TitleParameterized Projected Bellman Operator
Journal/Conference/Book TitleProceedings of the National Conference on Artificial Intelligence (AAAI)
Link to PDFhttps://arxiv.org/pdf/2312.12869.pdf
Reference TypeConference Proceedings
Author(s)Tiboni, G.; Klink, P.; Peters, J.; Tommasi, T.; D'Eramo, C.; Chalvatzaki, G.
Year2024
TitleDomain Randomization via Entropy Maximization
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
Link to PDFhttps://arxiv.org/pdf/2311.01885.pdf
Reference TypeConference Proceedings
Author(s)Hahne, F.; Prasad V.; Kshirsagar A.; Koert D.; Stock-Homburg R. M.; Peters J.; Chalvatzaki G.
Year2024
TitleTransition State Clustering for Interaction Segmentation and Learning
Journal/Conference/Book TitleACM/IEEE International Conference on Human Robot Interaction (HRI), Late Breaking Report
Link to PDFhttps://arxiv.org/pdf/2402.14548.pdf
Reference TypeConference Proceedings
Author(s)Goeksu, Y.; Almeida-Correia, A.; Prasad, V.; Kshirsagar, A.; Koert, D.; Peters, J.; Chalvatzaki, G.
Year2024
TitleKinematically Constrained Human-like Bimanual Robot-to-Human Handovers
Journal/Conference/Book TitleACM/IEEE International Conference on Human Robot Interaction (HRI), Late Breaking Report
Link to PDFhttps://arxiv.org/pdf/2402.14525.pdf
Reference TypeConference Proceedings
Author(s)Hendawy, A.; Peters, J.; D'Eramo, C.
Year2024
TitleMulti-Task Reinforcement Learning with Mixture of Orthogonal Experts
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
AbstractMulti-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.
URL(s) https://arxiv.org/abs/2311.11385
Link to PDFhttps://arxiv.org/pdf/2311.11385.pdf
Reference TypeConference Proceedings
Author(s)Reddi, A.; Toelle, M.; Peters, J.; Chalvatzaki, G.; D'Eramo, C.
Year2024
TitleRobust Adversarial Reinforcement Learning via Bounded Rationality Curricula
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
AbstractRobustness against adversarial attacks and distribution shifts is a long-standing goal of Reinforcement Learning (RL). To this end, Robust Adversarial Reinforcement Learning (RARL) trains a protagonist against destabilizing forces exercised by an adversary in a competitive zero-sum Markov game, whose optimal solution, i.e., rational strategy, corresponds to a Nash equilibrium. However, finding Nash equilibria requires facing complex saddle point optimization problems, which can be prohibitive to solve, especially for high-dimensional control. In this paper, we propose a novel approach for adversarial RL based on entropy regularization to ease the complexity of the saddle point optimization problem. We show that the solution of this entropy-regularized problem corresponds to a Quantal Response Equilibrium (QRE), a generalization of Nash equilibria that accounts for bounded rationality, i.e., agents sometimes play random actions instead of optimal ones. Crucially, the connection between the entropy-regularized objective and QRE enables free modulation of the rationality of the agents by simply tuning the temperature coefficient. We leverage this insight to propose our novel algorithm, Quantal Adversarial RL (QARL), which gradually increases the rationality of the adversary in a curriculum fashion until it is fully rational, easing the complexity of the optimization problem while retaining robustness. We provide extensive evidence of QARL outperforming RARL and recent baselines across several MuJoCo locomotion and navigation problems in overall performance and robustness.
URL(s) https://arxiv.org/abs/2311.01642
Link to PDFhttps://arxiv.org/pdf/2311.01642.pdf
Reference TypeConference Proceedings
Author(s)Al-Hafez, F.; Zhao, G.; Peters, J.; Tateo, D.
Year2024
TitleTime-Efficient Reinforcement Learning with Stochastic Stateful Policies
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
Keywordsreinforcement learning, imitation, stateful policies
Link to PDFhttps://arxiv.org/pdf/2311.04082.pdf
Reference TypeConference Proceedings
Author(s)Scherf, L.; Gasche, L. A.; Chemangui, E.; Koert, D.
Year2024
TitleAre You Sure? - Multi-Modal Human Decision Uncertainty Detection in Human-Robot Interaction
Journal/Conference/Book Title2024 ACM/IEEE International Conference on Human-Robot Interaction (HRI ’24)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/LisaScherf/Scherf_HRI2024.pdf
Reference TypeConference Proceedings
Author(s)Boehm, A.; Schneider, T.; Belousov, B.; Kshirsagar, A.; Lin, L.; Doerschner, K.; Drewing, K.; Rothkopf, C.A.; Peters, J.
Year2024
TitleWhat Matters for Active Texture Recognition With Vision-Based Tactile Sensors
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/boehm24_tart.pdf
Reference TypeConference Proceedings
Author(s)Lach, L.; Haschke, R.; Tateo, D.; Peters, J.; Ritter, H.; Sol, J.; Torras, C.
Year2024
TitleTransferring Tactile-based Continuous Force Control Policies from Simulation to Robot
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://arxiv.org/pdf/2311.07245.pdf
Reference TypeJournal Article
Author(s)Buechler, D.; Calandra, R.; Peters, J.
Year2023
TitleLearning to Control Highly Accelerated Ballistic Movements on Muscular Robots
Journal/Conference/Book TitleRobotics and Autonomous Systems
Volume159
Number104230
Link to PDFhttps://arxiv.org/pdf/1904.03665.pdf
Reference TypeConference Paper
Author(s)Toelle, M.; Belousov, B.; Peters, J.
Year2023
TitleA Unifying Perspective on Language-Based Task Representations for Robot Control
Journal/Conference/Book TitleCoRL Workshop on Language and Robot Learning: Language as Grounding
KeywordsLanguage-Based Task Representations, Robot Control
AbstractNatural language is becoming increasingly important in robot control for both high-level planning and goal-directed conditioning of motor skills. While a number of solutions have been proposed already, it is yet to be seen what architecture will succeed in seamlessly integrating language, vision, and action. To better understand the landscape of existing methods, we propose to view the algorithms from the perspective of “Language-Based Task Representations”, i.e., categorizing the methods that condition robot action generation on natural language commands according to their task representation and embedding architecture. Our proposed taxonomy intuitively groups existing algorithms, highlights their commonalities and distinctions, and suggests directions for further investigation.
URL(s) https://openreview.net/pdf?id=X128TIOXXw
Reference TypeJournal Article
Author(s)Lutter, M.; Peters, J.
Year2023
TitleCombining Physics and Deep Learning to learn Continuous-Time Dynamics Models
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Volume42
Number3
Link to PDFhttps://arxiv.org/pdf/2110.01894.pdf
Reference TypeJournal Article
Author(s)Lutter, M.; Belousov, B.; Mannor, S.; Fox, D.; Garg, A.; Peters, J.
Year2023
TitleContinuous-Time Fitted Value Iteration for Robust Policies
Journal/Conference/Book TitleIEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI)
URL(s) https://ieeexplore.ieee.org/document/9925102
Link to PDFhttps://arxiv.org/pdf/2110.01954.pdf
Reference TypeJournal Article
Author(s)Loeckel, S.; Ju, S.; Schaller, M.; van Vliet, P..; Peters, J.
Year2023
TitleAn Adaptive Human Driver Model for Realistic Race Car Simulations
Journal/Conference/Book TitleIEEE Transactions on Systems, Man and Cybernetics: Systems (TSMC)
Volume53
Number11
Pages6718-6730
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/Loeckel_TSMC_2023.pdf
Reference TypeJournal Article
Author(s)Look, A.; Kandemir, M.; Rakitsch, B.; Peters, J.
Year2023
TitleA Deterministic Approximation to Neural SDEs
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Volume45
Number4
Pages4023-4037
Link to PDFhttps://arxiv.org/pdf/2006.08973.pdf
Reference TypeConference Proceedings
Author(s)Liu, Y.; Belousov, B.; Funk, N.; Chalvatzaki, G.; Peters, J.; Tessman, O.
Year2023
TitleAuto(mated)nomous Assembly
Journal/Conference/Book TitleInternational Conference on Trends on Construction in the Post-Digital Era
PublisherSpringer, Cham
Pages167-181
URL(s) https://link.springer.com/chapter/10.1007/978-3-031-20241-4_12
Reference TypeConference Proceedings
Author(s)Liu, P.; Zhang, K.; Tateo, D.; Jauhri, S.; Hu, Z.; Peters, J. Chalvatzaki, G.
Year2023
TitleSafe Reinforcement Learning of Dynamic High-Dimensional Robotic Tasks: Navigation, Manipulation, Interaction
Journal/Conference/Book Title2023 IEEE International Conference on Robotics and Automation (ICRA)
AbstractSafety is a crucial property of every robotic platform: any control policy should always comply with actuator limits and avoid collisions with the environment and humans. In reinforcement learning, safety is even more fundamental for exploring an environment without causing any damage. While there are many proposed solutions to the safe exploration problem, only a few of them can deal with the complexity of the real world. This paper introduces a new formulation of safe exploration for reinforcement learning of various robotic tasks. Our approach applies to a wide class of robotic platforms and enforces safety even under complex collision constraints learned from data by exploring the tangent space of the constraint manifold. Our proposed approach achieves state-of-the-art performance in simulated high-dimensional and dynamic tasks while avoiding collisions with the environment. We show safe real-world deployment of our learned controller on a TIAGo++ robot, achieving remarkable performance in manipulation and human-robot interaction tasks.
PublisherIEEE
URL(s) https://arxiv.org/abs/2209.13308
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PuzeLiu/ICRA_2023_ATACOM.pdf
Reference TypeConference Paper
Author(s)Zhu, Y.; Nazirjonov, S.; Jiang, B.; Colan, J.; Aoyama, T.; Hasegawa, Y.; Belousov, B.; Hansel, K.; Peters, J.
Year2023
TitleVisual Tactile Sensor Based Force Estimation for Position-Force Teleoperation
Journal/Conference/Book TitleIEEE International Conference on Cyborg and Bionic Systems (CBS)
Pages49-52
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/KayHansel/2022__visual_tactile_sensor_based_force_estimation_for_position_force_teleoperation.pdf
Reference TypeConference Proceedings
Author(s)Zelch, C.; Peters, J.; von Stryk, C.
Year2023
TitleStart State Selection for Control Policy Learning from Optimal Trajectories
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Research/Overview/acceptedversion_Zelch_ICRA23.pdf
Reference TypeConference Proceedings
Author(s)Urain, J.; Funk, N.; Peters, J.; Chalvatzaki G
Year2023
TitleSE(3)-DiffusionFields: Learning smooth cost functions for joint grasp and motion optimization through diffusion
Journal/Conference/Book TitleInternational Conference on Robotics and Automation (ICRA)
KeywordsSE(3), 6D grasping, Robotics, Diffusion Models
URL(s) https://sites.google.com/view/se3dif/home
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JulenUrainDeJesus/2023se3graspurain.pdf
Reference TypeConference Proceedings
Author(s)Hansel, K.; Urain, J.; Peters, J.; Chalvatzaki, G.
Year2023
TitleHierarchical Policy Blending as Inference for Reactive Robot Control
Journal/Conference/Book Title2023 IEEE International Conference on Robotics and Automation (ICRA)
AbstractMotion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.
PublisherIEEE
URL(s) https://sites.google.com/view/hipbi
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/KayHansel/hierarchical_policy_blending_as_inference_icra_2023.pdf
Reference TypeConference Proceedings
Author(s)Le, A. T.; Hansel, K.; Peters, J.; Chalvatzaki, G.
Year2023
TitleHierarchical Policy Blending As Optimal Transport
Journal/Conference/Book Title5th Annual Learning for Dynamics & Control Conference (L4DC)
PublisherPMLR
URL(s) https://sites.google.com/view/hipobot
Link to PDFhttps://proceedings.mlr.press/v211/le23a/le23a.pdf
Reference TypeConference Proceedings
Author(s)Luis, C.; Bottero, A.G.; Vinogradska, J.; Berkenkamp, F.; Peters, J.
Year2023
TitleModel-Based Uncertainty in Value Functions
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)
Link to PDFhttps://arxiv.org/pdf/2302.12526.pdf
Reference TypeConference Proceedings
Author(s)Al-Hafez, F.; Tateo, D.; Arenz, O.; Zhao, G.; Peters, J.
Year2023
TitleLS-IQ: Implicit Reward Regularization for Inverse Reinforcement Learning
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
KeywordsInverse Reinforcement Learning, Inverse Q-Learning, Implicit Reward Regularization, Imitation Learning, Locomotion
AbstractRecent methods for imitation learning directly learn a Q-function using an implicit reward formulation, rather than an explicit reward function. However, these methods generally require implicit reward regularization for improving stability, mistreating or even neglecting absorbing states. Previous works show that a squared norm regularization on the implicit reward function is effective, but do not provide a theoretical analysis of the resulting properties of the algorithms. In this work, we show that using this regularizer under a mixture distribution of the policy and the expert provides a particularly illuminating perspective: the original objective can be understood as squared Bellman error minimization, and the corresponding optimization problem minimizes the χ2-Divergence between the expert and the mixture distribution. This perspective allows us to address instabilities and properly treat absorbing states. We show that our method, Least Squares Inverse Q-Learning (LS-IQ), outperforms state-of-the-art algorithms, particularly in environments with absorbing states. Finally, we propose to use an inverse dynamics model to learn from observations only. Using this approach, we retain performance in settings where no expert actions are available.
URL(s) https://openreview.net/forum?id=o3Q4m8jg4BR&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DICLR.cc%2F2023%2FConference%2FAuthors%23your-submissions)
Link to PDFhttps://openreview.net/pdf?id=o3Q4m8jg4BR
Reference TypeConference Proceedings
Author(s)Palenicek, D.; Lutter, M.; Carvalho, J.; Peters, J.
Year2023
TitleDiminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
Link to PDFhttps://openreview.net/pdf?id=H4Ncs5jhTCu
Reference TypeConference Proceedings
Author(s)Buechler, D.; Guist, S.; Calandra, R.; Berenz, V.; Schoelkopf, B.; Peters, J.
Year2023
TitleLearning to Play Table Tennis From Scratch using Muscular Robots
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA), IEEE T-TRo Track
Link to PDFhttps://arxiv.org/pdf/2006.05935.pdf
Reference TypeConference Proceedings
Author(s)Urain, J.; Tateo, D.; Peters, J.
Year2023
TitleLearning Stable Vector Fields on Lie Groups
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA), IEEE R-AL Track
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/urain_2023_liesvf.pdf
Reference TypeJournal Article
Author(s)Bjelonic, F.; Lee, J.; Arm, P.; Sako, D.; Tateo, D.; Peters, J.; Hutter, M.
Year2023
TitleLearning-Based Design and Control for Quadrupedal Robots With Parallel-Elastic Actuators
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (R-AL)
Volume8
Number3
Pages1611-1618
Link to PDFhttps://arxiv.org/pdf/2301.03509
Reference TypeConference Proceedings
Author(s)Bethge, J.; Pfefferkorn, M.; Rose, A.; Peters, J.; Findeisen, R.
Year2023
TitleModel predictive control with Gaussian-process-supported dynamical constraints for autonomous vehicles
Journal/Conference/Book TitleProceedings of the 22nd World Congress of the International Federation of Automatic Control
Link to PDFhttps://arxiv.org/pdf/2303.04725.pdf
Reference TypeJournal Article
Author(s)Urain, J.; Li, A.; Liu, P.; D'Eramo, C.; Peters, J.
Year2023
TitleComposable energy policies for reactive motion generation and reinforcement learning
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/urain_2023_cep_ijrr.pdf
Reference TypeJournal Article
Author(s)Ju, S.; van Vliet, P.; Arenz, O.; Peters, J.
Year2023
TitleDigital Twin of a Driver-in-the-Loop Race Car Simulation with Contextual Reinforcement Learning
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
Volume8
Number7
Pages4107-4114
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/RAL_Siwei_Ju.pdf
Reference TypeJournal Article
Author(s)Peters, S.; Peters, J.; Findeisen, R.
Year2023
TitleQuantifying Uncertainties along the Automated Driving Stack
Journal/Conference/Book TitleATZ worldwide volume
Volume125
Pages62-65
URL(s) https://link.springer.com/article/10.1007/s38311-023-1489-8
Reference TypeJournal Article
Author(s)Arenz, O.; Dahlinger, P.; Ye, Z.; Volpp, M.; Neumann, G.
Year2023
TitleA Unified Perspective on Natural Gradient Variational Inference with Gaussian Mixture Models
Journal/Conference/Book TitleTransactions on Machine Learning Research (TMLR)
AbstractVariational inference with Gaussian mixture models (GMMs) enables learning of highly tractable yet multi-modal approximations of intractable target distributions with up to a few hundred dimensions. The two currently most effective methods for GMM-based variational inference, VIPS and iBayes-GMM, both employ independent natural gradient updates for the individual components and their weights. We show for the first time, that their derived updates are equivalent, although their practical implementations and theoretical guarantees differ. We identify several design choices that distinguish both approaches, namely with respect to sample selection, natural gradient estimation, stepsize adaptation, and whether trust regions are enforced or the number of components adapted. We argue that for both approaches, the quality of the learned approximations can heavily suffer from the respective design choices: By updating the individual components using samples from the mixture model, iBayes-GMM often fails to produce meaningful updates to low-weight components, and by using a zero-order method for estimating the natural gradient, VIPS scales badly to higher-dimensional problems. Furthermore, we show that information-geometric trust-regions (used by VIPS) are effective even when using first-order natural gradient estimates, and often outperform the improved Bayesian learning rule (iBLR) update used by iBayes-GMM. We systematically evaluate the effects of design choices and show that a hybrid approach significantly outperforms both prior works. Along with this work, we publish our highly modular and efficient implementation for natural gradient variational inference with Gaussian mixture models, which supports 432 different combinations of design choices, facilitates the reproduction of all our experiments, and may prove valuable for the practitioner.
Link to PDF/uploads/Team/PubOlegArenz/gmmvi.pdf
Reference TypeConference Proceedings
Author(s)Carvalho, J.; Le, A. T.; Baierl, M.; Koert, D.; Peters, J.
Year2023
TitleMotion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsMotion Planning, Diffusion Models
URL(s) https://sites.google.com/view/mp-diffusion
Link to PDFhttps://arxiv.org/abs/2308.01557
Reference TypeConference Paper
Author(s)Funk, N.; Mueller, P.-O.; Belousov, B.; Savchenko, A.; Findeisen, R.; Peters, J.
Year2023
TitleHigh-Resolution Pixelwise Contact Area and Normal Force Estimation for the GelSight Mini Visuotactile Sensor Using Neural Networks
Journal/Conference/Book TitleEmbracing Contacts-Workshop at ICRA 2023
URL(s) https://openreview.net/forum?id=dUO0QQw4FW
Link to PDFhttps://openreview.net/pdf?id=dUO0QQw4FW
Reference TypeJournal Article
Author(s)Scherf, L.; Schmidt, A.; Pal, S.; Koert, D.
Year2023
TitleInteractively learning behavior trees from imperfect human demonstrations
Journal/Conference/Book TitleFrontiers in Robotics and AI
Volume10
Link to PDFhttps://www.frontiersin.org/articles/10.3389/frobt.2023.1152595/pdf
Reference TypeConference Paper
Author(s)Vincent, T.; Belousov, B.; D'Eramo, C.; Peters, J.
Year2023
TitleIterated Deep Q-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning
Journal/Conference/Book TitleEuropean Workshop on Reinforcement Learning (EWRL)
Link to PDFhttps://openreview.net/pdf?id=6dJGuVyR7K
Reference TypeConference Proceedings
Author(s)Al-Hafez, F.; Tateo, D.; Arenz, O.; Zhao, G.; Peters, J.
Year2023
TitleLeast Squares Inverse Q-Learning
Journal/Conference/Book TitleEuropean Workshop on Reinforcement Learning (EWRL)
URL(s) https://openreview.net/forum?id=BcHDYvNg-W4
Link to PDFhttps://openreview.net/pdf?id=BcHDYvNg-W4
Reference TypeJournal Article
Author(s)Look, A.; Kandemir, M.; Rakitsch, B.; Peters, J.
Year2023
TitleCheap and Deterministic Inference for Deep State-Space Models of Interacting Dynamical Systems
Journal/Conference/Book TitleTransactions on Machine Learning Research (TMLR)
AbstractGraph neural networks are often used to model interacting dynamical systems since they gracefully scale to systems with a varying and high number of agents. While there has been much progress made for deterministic interacting systems, modeling is much more challenging for stochastic systems in which one is interested in obtaining a predictive distribution over future trajectories. Existing methods are either computationally slow since they rely on Monte Carlo sampling or make simplifying assumptions such that the predictive distribution is unimodal. In this work, we present a deep state-space model which employs graph neural networks in order to model the underlying interacting dynamical system. The predictive distribution is multimodal and has the form of a Gaussian mixture model, where the moments of the Gaussian components can be computed via deterministic moment matching rules. Our moment matching scheme can be exploited for sample-free inference, leading to more efficient and stable training compared to Monte Carlo alternatives. Furthermore, we propose structured approximations to the covariance matrices of the Gaussian components in order to scale up to systems with many agents. We benchmark our novel framework on two challenging autonomous driving datasets. Both confirm the benefits of our method compared to state-of-the-art methods. We further demonstrate the usefulness of our individual contributions in a carefully designed ablation study and provide a detailed runtime analysis of our proposed covariance approximations. Finally, we empirically demonstrate the generalization ability of our method by evaluating its performance on unseen scenarios.
Link to PDFhttps://arxiv.org/abs/2305.01773
Reference TypeConference Proceedings
Author(s)Flynn, H.; Reeb, D.; Kandemir, M.; Peters, J.
Year2023
TitleImproved Algorithms for Stochastic Linear Bandits Using Tail Bounds for Martingale Mixtures
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
Link to PDF/uploads/Team/HamishFlynn/mmucb2.pdf
Reference TypeConference Proceedings
Author(s)Gruner, T.; Belousov, B.; Muratore, F.; Palenicek, D.; Peters, J.
Year2023
TitlePseudo-Likelihood Inference
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
Link to PDF/uploads/Team/TheoGruner/pli_final
Reference TypeConference Proceedings
Author(s)Le, A. T.; Chalvatzaki, G.; Biess, A.; Peters, J.
Year2023
TitleAccelerating Motion Planning via Optimal Transport
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
URL(s) https://sites.google.com/view/sinkhorn-step/
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/AnThaiLe/mpot_preprint.pdf
Reference TypeConference Proceedings
Author(s)Le, A. T.; Chalvatzaki, G.; Biess, A.; Peters, J.
Year2023
TitleAccelerating Motion Planning via Optimal Transport
Journal/Conference/Book TitleIROS 2023 Workshop on Differentiable Probabilistic Robotics: Emerging Perspectives on Robot Learning
KeywordsMotion Planning, Trajectory Optimization, Optimal Transport
Volume[Oral]
Link to PDFhttps://openreview.net/pdf?id=Gx62uPXEul
Reference TypeConference Paper
Author(s)Rother, D.; Weisswange, T.H.; Peters, J.
Year2023
TitleDisentangling Interaction using Maximum Entropy Reinforcement Learning in Multi-Agent Systems
Journal/Conference/Book TitleEuropean Conference on Artificial Intelligence (ECAI)
KeywordsAd-Hoc Teamwork, Maximum Entropy Reinforcement Learning, Coexistence, Mixed Motive
AbstractResearch on multi-agent interaction involving both multiple artificial agents and humans is still in its infancy. Most recent approaches have focused on environments with collaboration-focused human behavior, or providing only a small, defined set of situations. When deploying robots in human-inhabited environments in the future, it will be unlikely that all interactions fit a predefined model of collaboration, where collaborative behavior is still expected from the robot. Existing approaches are unlikely to effectively create such behaviors in such "coexistence" environments. To tackle this issue, we introduce a novel framework that decomposes interaction and tasksolving into separate learning problems and blends the resulting policies at inference time. Policies are learned with maximum entropy reinforcement learning, allowing us to create interaction-impact-aware agents and scale the cost of training agents linearly with the number of agents and available tasks. We propose a weighting function covering the alignment of interaction distributions with the original task. We demonstrate that our framework addresses the scaling problem while solving a given task and considering collaboration opportunities in a co-existence particle environment and a new cooking environment. Our work introduces a new learning paradigm that opens the path to more complex multi-robot, multi-human interactions.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DavidRother/ecai_2023.pdf
Reference TypeConference Proceedings
Author(s)Rother, D.
Year2023
TitleImplicitly Cooperative Agents through Impact-Aware Learning
Journal/Conference/Book TitleEuropean Conference on Artificial Intelligence (ECAI)
AbstractThis research explores how autonomous agents learn and interact in shared environments, emphasizing the understanding of others as explicit agents rather than simple dynamic obstacles. When deploying robots in human-inhabited environments in the future, it will be unlikely that all interactions fit a predefined model of collaboration, where collaborative behavior is still expected from the robot. Utilizing the "theory of mind" concept, the research aims to infer the beliefs, policies, intentions, and goals of other agents, enabling the evaluation of our agent’s impact on them. The study aims to create a multi-agent system capable of promoting inherent cooperation even with mixed objectives and adapting to various applications. Using Reinforcement Learning we develop a modular system that is capable to adapt to changing team sizes and motives for different agents. The developed method is trialed in a real-world assistant robot setup, testing cooperative actions without explicit initiation. Further evaluations occur in simulated environments, i.e. a cooking environment, to manage the policies of other agents and action recognition issues.We can measure the success of our method through the increased utility of either the population or single agents. Additionally, user studies can be conducted in which we can directly measure the satisfaction of humans when working alongside our agents and compare those to other methods.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DavidRother/doctoral_consortium.pdf
Reference TypeConference Paper
Author(s)Vincent, T.; Metelli, A.; Peters, J.; Restelli, M.; D'Eramo, C.
Year2023
TitleParameterized projected Bellman operator
Journal/Conference/Book TitleICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems
Link to PDFhttps://openreview.net/pdf?id=UnNdjopNeW
Reference TypeConference Paper
Author(s)Chen, Q.; Zhu, Y.; Hansel, Kay.; Aoyama, T.; Hasegawa, Y.
Year2023
TitleHuman Preferences and Robot Constraints Aware Shared Control for Smooth Follower Motion Execution
Journal/Conference/Book TitleIEEE International Symposium on Micro-NanoMechatronics and Human Science (MHS)
PublisherIEEE
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/KayHansel/human_preferences_and_robot_constraints_aware_shared_control.pdf
Reference TypeJournal Article
Author(s)Gu, S.; Kshirsagar, A.; Du Y.; Chen G.; Peters J.; Knoll A.
Year2023
TitleA Human-Centered Safe Robot Reinforcement Learning Framework with Interactive Behaviors
Journal/Conference/Book TitleFrontiers in Neurorobotics
Volume17
Number1280341
URL(s) https://www.frontiersin.org/articles/10.3389/fnbot.2023.1280341/full
Link to PDFhttps://www.frontiersin.org/articles/10.3389/fnbot.2023.1280341/full
Reference TypeConference Paper
Author(s)Mittenbuehler, M.; Hendawy, A.; D'Eramo, C.; Chalvatzaki, G.
Year2023
TitleParameter-efficient Tuning of Pretrained Visual-Language Models in Multitask Robot Learning
Journal/Conference/Book TitleCoRL 2023 Workshop on Learning Effective Abstractions for Planning (LEAP)
Keywordspretrained visual-language models, multitask robot learning, adapters
AbstractMultimodal pretrained visual-language models (pVLMs) have showcased excellence across several applications, like visual question-answering. Their recent application for policy learning manifested promising avenues for augmenting robotic capabilities in the real world. This paper delves into the problem of parameter-efficient tuning of pVLMs for adapting them to robotic manipulation tasks with low-resource data. We showcase how Low-Rank Adapters (LoRA) can be injected into behavioral cloning temporal transformers to fuse language, multi-view images, and proprioception for multitask robot learning, even for long-horizon tasks. Preliminary results indicate our approach vastly outperforms baseline architectures and tuning methods, paving the way toward parameter-efficient adaptation of pretrained large multimodal transformers for robot learning with only a handful of demonstrations.
Reference TypeConference Paper
Author(s)Metternich, H.; Hendawy, A.; Klink, P.; Peters, J.; D'Eramo, C.
Year2023
TitleUsing Proto-Value Functions for Curriculum Generation in Goal-Conditioned RL
Journal/Conference/Book TitleNeurIPS 2023 Workshop on Goal-Conditioned Reinforcement Learning
KeywordsReinforcement Learning, Curriculum Learning, Graph Laplacian
AbstractIn this paper, we investigate the use of Proto Value Functions (PVFs) for measuring the similarity between tasks in the context of Curriculum Learning (CL). PVFs serve as a mathematical framework for generating basis functions for the state space of a Markov Decision Process (MDP). They capture the structure of the state space manifold and have been shown to be useful for value function approximation in Reinforcement Learning (RL). We show that even a few PVFs allow us to estimate the similarity between tasks. Based on this observation, we introduce a new algorithm called Curriculum Representation Policy Iteration (CRPI) that uses PVFs for CL, and we provide a proof of concept in a Goal-Conditioned Reinforcement Learning (GCRL) setting.
Reference TypeConference Proceedings
Author(s)Le, A. T.; Chalvatzaki, G.; Biess, A.; Peters, J.
Year2023
TitleAccelerating Motion Planning via Optimal Transport
Journal/Conference/Book TitleNeurIPS 2023 Workshop Optimal Transport and Machine Learning
Volume[Oral]
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/AnThaiLe/mpot_preprint.pdf
Reference TypeConference Paper
Author(s)Boehm, A.; Schneider, T.; Belousov, B.; Kshirsagar, A.; Lin, L.; Doerschner, K.; Drewing, K.; Rothkopf, C.A.; Peters, J.
Year2023
TitleTactile Active Texture Recognition With Vision-Based Tactile Sensors
Journal/Conference/Book TitleNeurIPS Workshop on Touch Processing: a new Sensing Modality for AI
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/boehm24_tart.pdf
Reference TypeConference Proceedings
Author(s)Watson, J.; Peters, J.;
Year2023
TitleSample-Efficient Online Imitation Learning using Pretrained Behavioural Cloning Policies
Journal/Conference/Book TitleNeurIPS 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/watson23rlws.pdf
Reference TypeConference Proceedings
Author(s)Al-Hafez, F.; Zhao, G.; Peters, J.; Tateo, D.
Year2023
TitleLocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion
Journal/Conference/Book TitleRobot Learning Workshop, Conference on Neural Information Processing Systems (NeurIPS)
AbstractImitation Learning (IL) holds great promise for enabling agile locomotion in embodied agents. However, many existing locomotion benchmarks primarily focus on simplified toy tasks, often failing to capture the complexity of real-world scenarios and steering research toward unrealistic domains. To advance research in IL for locomotion, we present a novel benchmark designed to facilitate rigorous evaluation and comparison of IL algorithms. This benchmark encompasses a diverse set of environments, including quadrupeds, bipeds, and musculoskeletal human models, each accompanied by comprehensive datasets, such as real noisy motion capture data, ground truth expert data, and ground truth sub-optimal data, enabling evaluation across a spectrum of difficulty levels. To increase the robustness of learned agents, we provide an easy interface for dynamics randomization and offer a wide range of partially observable tasks to train agents across different embodiments. Finally, we provide handcrafted metrics for each task and ship our benchmark with state-of-the-art baseline algorithms to ease evaluation and enable fast benchmarking.
Link to PDFhttps://arxiv.org/pdf/2311.02496.pdf
Reference TypeJournal Article
Author(s)Parisi, S.; Tateo, D.; Hensel, M.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Year2022
TitleLong-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning
Journal/Conference/Book TitleAlgorithms
Volume15
Number3
Pages81
Link to PDFhttps://arxiv.org/abs/2001.00119
Reference TypeJournal Article
Author(s)Akrour, R.; Tateo, D.; Peters, J.
Year2022
TitleContinuous Action Reinforcement Learning from a Mixture of Interpretable Experts
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Volume44
Number10
Pages6795-6806
Link to PDFhttps://arxiv.org/pdf/2006.05911.pdf
Reference TypeJournal Article
Author(s)Loeckel, S.; Kretschi, A.; van Vliet, P.; Peters, J.
Year2022
TitleIdentification and modelling of race driving styles
Journal/Conference/Book TitleVehicle System Dynamics
Volume60
Number8
Pages2890--2918
Link to PDFhttps://doi.org/10.1080/00423114.2021.1930070
Reference TypeJournal Article
Author(s)Tosatto, S.; Carvalho, J.; Peters, J.
Year2022
TitleBatch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Volume44
Number10
Pages5996--6010
URL(s) https://ieeexplore.ieee.org/document/9449972
Reference TypeJournal Article
Author(s)Belousov, B.; Wibranek, B.; Schneider, J.; Schneider, T.; Chalvatzaki, G.; Peters, J.; Tessmann, O.
Year2022
TitleRobotic Architectural Assembly with Tactile Skills: Simulation and Optimization
Journal/Conference/Book TitleAutomation in Construction
Volume133
Pages104006
URL(s) https://doi.org/10.1016/j.autcon.2021.104006
Link to PDFhttps://www.sciencedirect.com/science/article/pii/S092658052100457X/pdfft?md5=d34f2f24487d3e8e4c84d3d8a60a9f28&pid=1-s2.0-S092658052100457X-main.pdf
Reference TypeJournal Article
Author(s)Funk, N.; Schaff, C.; Madan, R.; Yoneda, T.; Urain, J.; Watson, J.; Gordon, E.; Widmaier, F; Bauer, S.; Srinivasa, S.; Bhattacharjee, T.; Walter, M.; Peters, J.
Year2022
TitleBenchmarking Structured Policies and Policy Optimization for Real-World Dexterous Object Manipulation
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (R-AL)
AbstractDexterous manipulation is a challenging and important problem in robotics. While data-driven methods are a promising approach, current benchmarks require simulation or extensive engineering support due to the sample inefficiency of popular methods. We present benchmarks for the TriFinger system, an open-source robotic platform for dexterous manipulation and the focus of the 2020 Real Robot Challenge. The benchmarked methods, which were successful in the challenge, can be generally described as structured policies, as they combine elements of classical robotics and modern policy optimization. This inclusion of inductive biases facilitates sample efficiency, interpretability, reliability and high performance. The key aspects of this benchmarking is validation of the baselines across both simulation and the real system, thorough ablation study over the core features of each solution, and a retrospective analysis of the challenge as a manipulation benchmark. The code and demo videos for this work can be found on our website (https://sites.google.com/view/benchmark-rrc).
URL(s) https://sites.google.com/view/benchmark-rrc
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/NiklasFunk/RRC_v1.pdf
Reference TypeJournal Article
Author(s)Muratore, F.; Ramos, F.; Turk, G.; Yu, W.; Gienger, M.; Peters, J.
Year2022
TitleRobot Learning from Randomized Simulations: A Review
Journal/Conference/Book TitleFrontiers in Robotics and AI
Keywordsrobotics, simulation, reality gap, simulation optimization bias, reinforcement learning, domain randomization, sim-to-real
AbstractThe rise of deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. It is prohibitively expensive to generate such data sets on a physical platform. Therefore, state-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive and subsequently transfer the knowledge to the real robot (sim-to-real). Despite becoming increasingly realistic, all simulators are by construction based on models, hence inevitably imperfect. This raises the question of how simulators can be modified to facilitate learning robot control policies and overcome the mismatch between simulation and reality, often called the ‘reality gap’. We provide a comprehensive review of sim-to-real research for robotics, focusing on a technique named ‘domain randomization’ which is a method for learning from randomized simulations.
Volume9
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_RTYGP--RobotLearningFromRandomizedSimulations-AReview.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_RTYGP--RobotLearningFromRandomizedSimulations-AReview.pdf
LanguageEnglish
Last Modified Date2022-01.22
Reference TypeJournal Article
Author(s)Cowen-Rivers, A.I.; Palenicek, D.; Moens, V.; Abdullah, M.A.; Sootla, A.; Wang, J.; Bou-Ammar, H.
Year2022
TitleSAMBA: safe model-based & active reinforcement learning
Journal/Conference/Book TitleMachine Learning
Link to PDFhttps://link.springer.com/article/10.1007/s10994-021-06103-6
Reference TypeJournal Article
Author(s)You, B.; Arenz, O.; Chen, Y.; Peters, J.
Year2022
TitleIntegrating Contrastive Learning with Dynamic Models for Reinforcement Learning from Images
Journal/Conference/Book TitleNeurocomputing
URL(s) https://doi.org/10.1016/j.neucom.2021.12.094
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/OlegArenz/Integrating_Contrastive_Learning_with_Dynamic_Models_for_Reinforcement_Learning_from_Images.pdf
Reference TypeConference Proceedings
Author(s)Klink, P.; D`Eramo, C.; Peters, J.; Pajarinen, J.
Year2022
TitleBoosted Curriculum Reinforcement Learning
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PascalKlink/boosted-crl.pdf
Reference TypeConference Proceedings
Author(s)Memmel, M.; Liu, P.; Tateo, D.; Peters, J.
Year2022
TitleDimensionality Reduction and Prioritized Exploration for Policy Search
Journal/Conference/Book Title25th International Conference on Artificial Intelligence and Statistics (AISTATS)
KeywordsPolicy Search, Dimensionality Reduction, Exploration
AbstractBlack-box policy optimization, a class of reinforcement learning algorithms, explore and update policies at the parameter level. These algorithms are applied widely in robotics applications with movement primitives and non-differentiable policies. These methods are particularly relevant where exploration at the action level could lead to actuator damage or other safety issues. However, this class of algorithms does not scale well with the increasing dimensionality of the policy, leading to high demand for samples that are expensive to obtain on real-world systems. In most systems, policy parameters do not contribute equally to the return. Thus, identifying those parameters which contribute most allows us to narrow the exploration and speed up learning. Updating only the effective parameters requires fewer samples, solving the scalability issue. We present a novel method to prioritize exploration of effective parameters, coping with full covariance matrix updates. Our algorithm learns faster than recent approaches and requires fewer samples to achieve state-of-the-art results. To select these effective parameters, we consider both the Pearson correlation coefficient and the Mutual Information. We showcase the capabilities of our approach using the Relative Entropy Policy Search algorithm in several simulated environments, including robotics simulations.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PuzeLiu/AISTATS2022_DR-CREPS.pdf
Reference TypeJournal Article
Author(s)Flynn, H.; Reeb, D.; Kandemir, M.; Peters, J.
Year2022
TitlePAC-Bayesian Lifelong Learning For Multi-Armed Bandits
Journal/Conference/Book TitleData Mining and Knowledge Discovery
Volume36
Number2
Pages841-876
Link to PDFhttps://arxiv.org/pdf/2203.03303.pdf
Reference TypeJournal Article
Author(s)Prasad, V.; Stock-Homburg, R.; Peters, J.
Year2022
TitleHuman-Robot Handshaking: A Review
Journal/Conference/Book TitleInternational Journal of Social Robotics (IJSR)
KeywordsHandshaking, Physical HRI, Social Robotics
AbstractFor some years now, the use of social, anthropomorphic robots in various situations has been on the rise. These are robots developed to interact with humans and are equipped with corresponding extremities. They already support human users in various industries, such as retail, gastronomy, hotels, education and healthcare. During such Human-Robot Interaction (HRI) scenarios, physical touch plays a central role in the various applications of social robots as interactive non-verbal behaviour is a key factor in making the interaction more natural. Shaking hands is a simple, natural interaction used commonly in many social contexts and is seen as a symbol of greeting, farewell and congratulations. In this paper, we take a look at the existing state of Human-Robot Handshaking research, categorise the works based on their focus areas, draw out the major findings of these areas while analysing their pitfalls. We mainly see that some form of synchronisation exists during the different phases of the interaction. In addition to this, we also find that additional factors like gaze, voice facial expressions etc. can affect the perception of a robotic handshake and that internal factors like personality and mood can affect the way in which handshaking behaviours are executed by humans. Based on the findings and insights, we finally discuss possible ways forward for research on such physically interactive behaviours.
Volume14
Number1
Pages277-293
URL(s) https://link.springer.com/content/pdf/10.1007/s12369-021-00763-z.pdf
Link to PDFhttps://link.springer.com/content/pdf/10.1007/s12369-021-00763-z.pdf
Reference TypeJournal Article
Author(s)Zheng, Y.; Veiga, F.F.; Peters, J.; Santos, V.J.
Year2022
TitleAutonomous Learning of Page Flipping Movements via Tactile Feedback
Journal/Conference/Book TitleIEEE Transactions on Robotics (T-Ro)
Volume38
Number5
Pages2734 - 2749
URL(s) https://ieeexplore.ieee.org/document/9786532
Reference TypeConference Paper
Author(s)Palenicek, D.; Lutter, M., Peters, J.
Year2022
TitleRevisiting Model-based Value Expansion
Journal/Conference/Book TitleMulti-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Link to PDFhttps://arxiv.org/pdf/2203.14660.pdf
Reference TypeJournal Article
Author(s)Hansel, K.; Moos, J.; Abdulsamad, H.; Stark, S.; Clever, D.; Peters, J.
Year2022
TitleRobust Reinforcement Learning: A Review of Foundations and Recent Advances
Journal/Conference/Book TitleMachine Learning and Knowledge Extraction (MAKE)
PublisherMDPI
Volume4
Number1
Pages276--315
ISBN/ISSN2504-4990
URL(s) https://www.mdpi.com/2504-4990/4/1/13
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/KayHansel/robustRLsurvey22_hansel.pdf
Reference TypeUnpublished Work
Author(s)Carvalho, J.; Peters, J.
Year2022
TitleAn Analysis of Measure-Valued Derivatives for Policy Gradients
Journal/Conference/Book TitleMulti-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
URL(s) https://arxiv.org/pdf/2203.03917.pdf
Reference TypeJournal Article
Author(s)Weng, Y.; Pajarinen, J.; Akrour, R.; Matsuda, T.; Peters, J.; Maki, T.
Year2022
TitleReinforcement Learning Based Underwater Wireless Optical Communication Alignment for Multiple Autonomous Underwater Vehicles
Journal/Conference/Book TitleIEEE Journal of Oceanic Engineering
Volume47
Number4
Pages1231-1245
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Research/Overview/FinalSubmission-1.pdf
Reference TypeJournal Article
Author(s)Buechler, D.; Guist, S.; Calandra, R.; Berenz, V.; Schoelkopf, B.; Peters, J.
Year2022
TitleLearning to Play Table Tennis From Scratch using Muscular Robots
Journal/Conference/Book TitleIEEE Transactions on Robotics (T-Ro)
Volume38
Number6
Pages3850-3860
Link to PDFhttps://arxiv.org/pdf/2006.05935.pdf
Reference TypeConference Proceedings
Author(s)Klink, P.; Yang, H.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Year2022
TitleCurriculum Reinforcement Learning via Constrained Optimal Transport
Journal/Conference/Book TitleInternational Conference on Machine Learning (ICML)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PascalKlink/crrt.pdf
Reference TypeJournal Article
Author(s)Weng, Y.; Matsuda, T.; Sekimuri, Y.; Pajarinen, J.; Peters, J.; Maki, T.
Year2022
TitlePointing Error Control of Underwater Wireless Optical Communication on Mobile Platform
Journal/Conference/Book TitleIEEE Photonics Technology Letters
Volume34
Number13
Pages699-702
URL(s) https://ieeexplore.ieee.org/document/9791364
Reference TypeJournal Article
Author(s)Cowen-Rivers, A.; Lyu, W.; Tutunov, R.; Wang, Z.; Grosnit, A.; Griffiths, R.R.; Maraval, A.; Jianye, H.; Wang, J.; Peters, J.; Bou Ammar, H.
Year2022
TitleHEBO: An Empirical Study of Assumptions in Bayesian Optimisation
Journal/Conference/Book TitleJournal of Artificial Intelligence Research
Volume74
Pages1269-1349
Link to PDFhttps://arxiv.org/pdf/2012.03826.pdf
Reference TypeConference Proceedings
Author(s)Urain, J.; Le, A. T.; Lambert, A.; Chalvatzaki, G.; Boots, B.; Peters, J.
Year2022
TitleLearning Implicit Priors for Motion Optimization
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsMotion Planning, Energy-Based Models
URL(s) https://sites.google.com/view/implicit-priors
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/AnThaiLe/iros2022_ebmtrajopt.pdf
Reference TypeConference Proceedings
Author(s)Liu, P.; Zhang, K.; Tateo, D.; Jauhri, S.; Peters, J.; Chalvatzaki, G.;
Year2022
TitleRegularized Deep Signed Distance Fields for Reactive Motion Generation
Journal/Conference/Book Title2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PuzeLiu/IROS_2022_ReDSDF.pdf
Reference TypeConference Proceedings
Author(s)Zheng, Y.; Veiga, F.F.; Peters, J.; Santos, V.J.
Year2022
TitleAutonomous Learning of Page Flipping Movements via Tactile Feedback
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Member/FilipeVeiga/IROS22_3682_MS.pdf
Reference TypeConference Proceedings
Author(s)Funk, N.; Menzenbach, S.; Chalvatzaki, G.; Peters, J.
Year2022
TitleGraph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
URL(s) https://sites.google.com/view/rl-meets-milp
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/NiklasFunk/GNN_meets_MILP_v1.pdf
Reference TypeConference Proceedings
Author(s)Ploeger, K.; Peters, J.
Year2022
TitleControlling the Cascade: Kinematic Planning for N-ball Toss Juggling
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
AbstractDynamic movements are ubiquitous in human motor behavior as they tend to be more efficient and can solve a broader range of skill domains than their quasi-static counterparts. For decades, robotic juggling tasks have been among the most frequently studied dynamic manipulation problems since the required dynamic dexterity can be scaled to arbitrarily high difficulty. However, successful approaches have been limited to basic juggling skills, indicating a lack of understanding of the required constraints for dexterous toss juggling. We present a detailed analysis of the toss juggling task, identifying the key challenges and formalizing it as a trajectory optimization problem. Building on our state-of-the-art, real-world toss juggling platform, we reach the theoretical limits of toss juggling in simulation, evaluate a resulting real-time controller in environments of varying difficulty and achieve robust toss juggling of up to 17 balls on two anthropomorphic manipulators.
Link to PDFhttps://arxiv.org/abs/2207.01414
Reference TypeConference Proceedings
Author(s)Schneider, T.; Belousov, B.; Chalvatzaki, G.; Romeres, D.; Jha, D.K.; Peters, J.
Year2022
TitleActive Exploration for Robotic Manipulation
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
URL(s) https://sites.google.com/view/aerm
Link to PDFhttps://arxiv.org/pdf/2210.12806.pdf
Reference TypeConference Proceedings
Author(s)Schneider, T.; Belousov, B.; Abdulsamad, H.; Peters, J.
Year2022
TitleActive Inference for Robotic Manipulation
Journal/Conference/Book Title5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Keywordsrobotic manipulation, model-based reinforcement learning, information gain
AbstractRobotic manipulation stands as a largely unsolved problem despite significant advances in robotics and machine learning in the last decades. One of the central challenges of manipulation is partial observability, as the agent usually does not know all physical properties of the environment and the objects it is manipulating in advance. A recently emerging theory that deals with partial observability in an explicit manner is Active Inference. It does so by driving the agent to act in a way that is not only goal-directed but also informative about the environment. In this work, we apply Active Inference to a hard-to-explore simulated robotic manipulation tasks, in which the agent has to balance a ball into a target zone. Since the reward of this task is sparse, in order to explore this environment, the agent has to learn to balance the ball without any extrinsic feedback, purely driven by its own curiosity. We show that the information-seeking behavior induced by Active Inference allows the agent to explore these challenging, sparse environments systematically. Finally, we conclude that using an information-seeking objective is beneficial in sparse environments and allows the agent to solve tasks in which methods that do not exhibit directed exploration fail.
URL(s) https://arxiv.org/abs/2206.10313
Link to PDFhttps://arxiv.org/pdf/2206.10313.pdf
LanguageEnglish
Last Modified Date01.06.2022
Reference TypeConference Proceedings
Author(s)Galljamov, R.; Zhao, G.; Belousov, B.; Seyfarth, A.; Peters, J.
Year2022
TitleImproving Sample Efficiency of Example-Guided Deep Reinforcement Learning for Bipedal Walking
Journal/Conference/Book Title2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)
URL(s) https://ieeexplore.ieee.org/document/10000068
Reference TypeConference Proceedings
Author(s)Watson, J.; Peters, J.
Year2022
TitleInferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/watson22corl.pdf
Reference TypeConference Proceedings
Author(s)Carvalho, J.; Koert, D.; Daniv, M.; Peters, J.
Year2022
TitleAdapting Object-Centric Probabilistic Movement Primitives with Residual Reinforcement Learning
Journal/Conference/Book Title2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)
AbstractIt is desirable for future robots to quickly learn new tasks and adapt learned skills to constantly changing environments. To this end, Probabilistic Movement Primitives (ProMPs) have shown to be a promising framework to learn generalizable trajectory generators from distributions over demonstrated trajectories. However, in practical applications that require high precision in the manipulation of objects, the accuracy of ProMPs is often insufficient, in particular when they are learned in cartesian space from external observations and executed with limited controller gains. Therefore, we propose to combine ProMPs with the Residual Reinforcement Learning (RRL) framework, to account for both, corrections in position and orientation during task execution. In particular, we learn a residual on top of a nominal ProMP trajectory with Soft Actor-Critic and incorporate the variability in the demonstrations as a decision variable to reduce the search space for RRL. As a proof of concept, we evaluate our proposed method on a 3D block insertion task with a 7-DoF Franka Emika Panda robot. Experimental results show that the robot successfully learns to complete the insertion, which was not possible before with using basic ProMPs.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoaoCarvalho/Adapting_Object_Centric_Probabilistic_Movement_Primitives_with_Residual_Reinforcement_Learning___compressed.pdf
Reference TypeConference Proceedings
Author(s)Vorndamme, J.; Carvalho, J.; Laha, R.; Koert, D.; Figueredo, L.; Peters, J.; Haddadin, S.
Year2022
TitleIntegrated Bi-Manual Motion Generation and Control shaped for Probabilistic Movement Primitives
Journal/Conference/Book Title2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)
AbstractThis work introduces a novel cooperative control framework that allows for real-time reactiveness and adaptation whilst satisfying implicit constraints stemming from probabilistic/stochastic trajectories. Stemming from task-oriented sampling and/or task-oriented demonstrations, e.g., learning based on motion primitives, such trajectories carry additional information often neglected during real-time control deployment. In particular, methods such as probabilistic movement primitives offer the advantage to capture the inherent stochasticity in human demonstrations – which in turn reflects human’s understanding about task-variability and adaption possibilities. This information, however, is often poorly exploited and, mostly, used during offline trajectory planning stage. Our work instead introduces a novel real-time motion-generation strategy that explicitly exploits such information to improve trajectories according to changes in the environmental condition and robot task-space topology. The proposed solution is particularly well suited for bimanual and coordinated systems where the increased kinematic complexity, tightly-coupled constraints and reduced workspace have detrimental effects on the manipulability, joint limits, and are even capable of causing unstable behavior and task-failure. Our methodology addresses these challenges, and improves performance and task-execution by taking the confidence range region explicitly into account whilst maneuvering towards better configurations. Furthermore, it can directly cope with different closed-chain kinematics and task-space topologies, resulting for instance from different grasps. Experimental evaluations on a bi-manual Franka panda robot show that the proposed method can run in the inner control loop of the robot and enables successful execution of highly constrained tasks.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoaoCarvalho/IntegratedBiManualMotionGenerationandControlshapedforProMPs.pdf
Reference TypeConference Proceedings
Author(s)Scherf, L.; Turan, C.; Koert, D.
Year2022
TitleLearning from Unreliable Human Action Advice in Interactive Reinforcement Learning
Journal/Conference/Book Title2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids)
AbstractInteractive Reinforcement Learning (IRL) uses human input to improve learning speed and enable learning in more complex environments. Human action advice is here one of the input channels preferred by human users. However, many existing IRL approaches do not explicitly consider the possibility of inaccurate human action advice. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. This can lead to problems in practical cases, where human input might be inaccurate only in some states while it is still useful in others. To this end, we propose a novel algorithm that can handle state-dependent unreliable human action advice in IRL. Here, we combine three potential indicator signals for unreliable advice, i.e. consistency of advice, retrospective optimality of advice, and behavioral cues that hint at human uncertainty. We evaluate our method in a simulated gridworld and in robotic sorting tasks with 28 subjects. We show that our method outperforms a state-independent baseline and analyze occurrences of behavioral cues related to unreliable advice.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/LisaScherf/Scherf_Humanoids_2022_final.pdf
Reference TypeConference Proceedings
Author(s)Prasad, V.; Koert, D.; Stock-Homburg, R.; Peters, J.; Chalvatzaki, G.
Year2022
TitleMILD: Multimodal Interactive Latent Dynamics for Learning Human-Robot Interaction
Journal/Conference/Book TitleIEEE-RAS International Conference on Humanoid Robots (Humanoids)
AbstractModeling interaction dynamics to generate robot trajectories that enable a robot to adapt and react to a human’s actions and intentions is critical for efficient and effective collaborative Human-Robot Interactions (HRI). Learning from Demonstration (LfD) methods from Human-Human Interactions (HHI) have shown promising results, especially when coupled with representation learning techniques. However, such methods for learning HRI either do not scale well to high dimensional data or cannot accurately adapt to changing via-poses of the interacting partner. We propose Multimodal Interactive Latent Dynamics (MILD), a method that couples deep representation learning and probabilistic machine learning to address the problem of two-party physical HRIs. We learn the interaction dynamics from demonstrations, using Hidden Semi-Markov Models (HSMMs) to model the joint distribution of the interacting agents in the latent space of a Variational Autoencoder (VAE). Our experimental evaluations for learning HRI from HHI demonstrations show that MILD effectively captures the multimodality in the latent representations of HRI tasks, allowing us to decode the varying dynamics occurring in such tasks. Compared to related work, MILD generates more accurate trajectories for the controlled agent (robot) when conditioned on the observed agent’s (human) trajectory. Notably, MILD can learn directly from camera-based pose estimations to generate trajectories, which we then map to a humanoid robot without the need for any additional training. Supplementary Material: https://bit.ly/MILD-HRI
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/VigneshPrasad/HUMANOIDS22-MILD-HRI.pdf
Reference TypeUnpublished Work
Author(s)Carvalho, J.; Baierl, M; Urain, J; Peters, J.
Year2022
TitleConditioned Score-Based Models for Learning Collision-Free Trajectory Generation
Journal/Conference/Book TitleNeurIPS 2022 Workshop on Score-Based Methods
AbstractPlanning a motion in a cluttered environment is a recurring task autonomous agents need to solve. This paper presents a first attempt to learn generative models for collision-free trajectory generation based on conditioned score-based models. Given multiple navigation tasks, environment maps and collision-free trajectories pre-computed with a sample-based planner, using a signed distance function loss we learn a vision encoder of the map and use its embedding to learn a conditioned score-based model for trajectory generation. A novelty of our method is to integrate in a temporal U-net architecture, using a cross-attention mechanism, conditioning variables such as the latent representation of the environment and task features. We validate our approach in a simulated 2D planar navigation toy task, where a robot needs to plan a path that avoids obstacles in a scene.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoaoCarvalho/Conditioned_Score_Based_Models_for_Learning_Collision_Free_Trajectory_Generation.pdf
Reference TypeConference Paper
Author(s)Siebenborn, M.; Belousov, B.; Huang, J.; Peters, J.
Year2022
TitleHow Crucial is Transformer in Decision Transformer?
Journal/Conference/Book TitleFoundation Models for Decision Making Workshop at Neural Information Processing Systems
URL(s) https://arxiv.org/abs/2211.14655
Link to PDFhttps://arxiv.org/pdf/2211.14655.pdf
Reference TypeJournal Article
Author(s)Urain, J.; Tateo, D; Peters, J.
Year2022
TitleLearning Stable Vector Fields on Lie Groups
Journal/Conference/Book TitleRobotics and Automation Letters (RA-L)
KeywordsImitation learning, lie groups, learning from demonstration, machine learning for robot control, reactive motion generation.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JulenUrainDeJesus/2022msvfurain.pdf
Reference TypeConference Proceedings
Author(s)Watson, J.; Hanher, B.; Peters, J.
Year2022
TitleDifferentiable Simulators as Gaussian Processes
Journal/Conference/Book TitleR:SS Workshop: Differentiable Simulation for Robotics
Reference TypeConference Proceedings
Author(s)Watson, J.; Peters, J.;
Year2022
TitleStationary Posterior Policy Iteration with Variational Inference
Journal/Conference/Book TitleThe Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Reference TypeConference Proceedings
Author(s)Bottero, A.G.; Luis, C.E.; Vinogradska, J.; Berkenkamp, F.; Peters, J.
Year2022
TitleInformation-Theoretic Safe Exploration with Gaussian Processes
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
Link to PDFhttps://proceedings.neurips.cc/paper_files/paper/2022/file/c628644624c1be9c8cfb1541fa6421fd-Paper-Conference.pdf
Reference TypeConference Proceedings
Author(s)Le, A. T.; Urain, J.; Lambert, A.; Chalvatzaki, G.; Boots, B.; Peters, J.
Year2022
TitleLearning Implicit Priors for Motion Optimization
Journal/Conference/Book TitleRSS 2022 Workshop on Implicit Representations for Robotic Manipulation
Reference TypeJournal Article
Author(s)Muratore, F.; Gienger, M.; Peters, J.
Year2021
TitleAssessing Transferability from Simulation to Reality for Reinforcement Learning
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
AbstractLearning robot control policies from physics simulations is of great interest to the robotics community as it may render the learning process faster, cheaper, and safer by alleviating the need for expensive real-world experiments. However, the direct transfer of learned behavior from simulation to reality is a major challenge. Optimizing a policy on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB). In this case, the optimizer exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. We tackle this challenge by applying domain randomization, i.e., randomizing the parameters of the physics simulations during learning. We propose an algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) which uses an estimator of the SOB to formulate a stopping criterion for training. The introduced estimator quantifies the over-fitting to the set of domains experienced while training. Our experimental results on two different second order nonlinear systems show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator, which can be applied directly to real systems without any additional training.
PublisherIEEE
Volume43
Number4
Pages1172-1183
DateApril 2021
ISBN/ISSN0162-8828
URL(s) 10.1109/TPAMI.2019.2952353
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_Gienger_Peters--AssessingTransferabilityfromSimulationToRealityForReinforcementLearning.pdf
LanguageEnglish
Reference TypeJournal Article
Author(s)Rawal, N.; Koert, D.; Turan, C.; Kersting, K.; Peters, J.; Stock-Homburg, R.
Year2021
TitleExGenNet: Learning to Generate Robotic Facial Expression Using Facial Expression Recognition
Journal/Conference/Book TitleFrontiers in Robotics & AI
Volume8
Number730317
URL(s) https://www.frontiersin.org/articles/10.3389/frobt.2021.730317/full
Reference TypeJournal Article
Author(s)Tosatto, S.; Akrour, R.; Peters, J.
Year2021
TitleAn Upper Bound of the Bias of Nadaraya-Watson Kernel Regression under Lipschitz Assumptions
Journal/Conference/Book TitleStats
KeywordsNonparametric, Bias, Kernel Regression
AbstractThe Nadaraya-Watson kernel estimator is among the most popular nonparameteric regression technique thanks to its simplicity. Its asymptotic bias has been studied by Rosenblatt in 1969 and has been reported in several related literature. However, given its asymptotic nature, it gives no access to a hard bound. The increasing popularity of predictive tools for automated decision-making surges the need for hard (non-probabilistic) guarantees. To alleviate this issue, we propose an upper bound of the bias which holds for finite bandwidths using Lipschitz assumptions and mitigating some of the prerequisites of Rosenblatt’s analysis. Our bound has potential applications in fields like surgical robots or self-driving cars, where some hard guarantees on the prediction-error are needed.
Place PublishedBasel, Switzerland
Volume4
Pages1--17
URL(s) https://www.mdpi.com/2571-905X/4/1/1/htm
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Alumni/SamueleTosatto/tosatto-mdpi-2021.pdf
Reference TypeJournal Article
Author(s)Muratore, F.; Eilers, C.; Gienger, M.; Peters, J.
Year2021
TitleData-efficient Domain Randomization with Bayesian Optimization
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L), with Presentation at the IEEE International Conference on Robotics and Automation (ICRA)
Keywordssim-2-real, domain randomization, bayesian optimization
AbstractWhen learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire, so learning in simulation is a popular strategy. Unfortunately, such polices are often not transferable to the real world due to a mismatch between the simulation and reality, called 'reality gap'. Domain randomization methods tackle this problem by randomizing the physics simulator (source domain) during training according to a distribution over domain parameters in order to obtain more robust policies that are able to overcome the reality gap. Most domain randomization approaches sample the domain parameters from a fixed distribution. This solution is suboptimal in the context of sim-to-real transferability, since it yields policies that have been trained without explicitly optimizing for the reward on the real system (target domain). Additionally, a fixed distribution assumes there is prior knowledge about the uncertainty over the domain parameters. In this paper, we propose Bayesian Domain Randomization (BayRn), a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution during learning given sparse data from the real-world target domain. BayRn uses Bayesian optimization to search the space of source domain distribution parameters such that this leads to a policy which maximizes the real-word objective, allowing for adaptive distributions during policy optimization. We experimentally validate the proposed approach in sim-to-sim as well as in sim-to-real experiments, comparing against three baseline methods on two robotic tasks. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.
PublisherIEEE
URL(s) https://ieeexplore.ieee.org/document/9327467
Link to PDFhttps://arxiv.org/pdf/2003.02471.pdf
LanguageEnglish
Last Modified Date2021-01-06
Reference TypeJournal Article
Author(s)Hoefer, S.; Bekris, K.; Handa, A.; Gamboa, J.C.; Golemo, F.; Mozifian, M.; Atkeson, C.G., Fox, D.; Goldberg, K.; Leonard, J.; Liu, C.K.; Peters, J.; Song, S.; Welinder, P.; White, M.
Year2021
TitleSim2Real in Robotics and Automation: Applications and Challenges
Journal/Conference/Book TitleIEEE Transactions on Automation Science (T-ASE)
Volume18
Number2
Pages398-400
Link to PDFhttps://arxiv.org/pdf/2012.03806.pdf
Reference TypeJournal Article
Author(s)Akrour, R.; Atamna, A.; Peters, J.
Year2021
TitleConvex Optimization with an Interpolation-based Projection and its Application to Deep Learning
Journal/Conference/Book TitleMachine Learning (MACH)
Volume110
Number8
Pages2267-2289
Link to PDFhttps://arxiv.org/pdf/2011.07016.pdf
Reference TypeBook
Author(s)Belousov, B.; Abdulsamad H.; Klink, P.; Parisi, S.; Peters, J.
Year2021
TitleReinforcement Learning Algorithms: Analysis and Applications
Journal/Conference/Book TitleStudies in Computational Intelligence
PublisherSpringer International Publishing
Edition1
URL(s) https://www.springer.com/gp/book/9783030411879
Reference TypeConference Paper
Author(s)Watson, J.; Lin, J. A.; Klink, P.; Peters, J.
Year2021
TitleNeural Linear Models with Functional Gaussian Process Priors
Journal/Conference/Book Title3rd Symposium on Advances in Approximate Bayesian Inference (AABI)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/AABI21.pdf
Reference TypeConference Proceedings
Author(s)Watson, J.; Peters, J.
Year2021
TitleAdvancing Trajectory Optimization with Approximate Inference: Exploration, Covariance Control and Adaptive Risk
Journal/Conference/Book TitleAmerican Control Conference (ACC)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/cdc21.pdf
Reference TypeConference Proceedings
Author(s)Watson, J.; Lin J. A.; Klink, P.; Pajarinen, J.; Peters, J.
Year2021
TitleLatent Derivative Bayesian Last Layer Networks
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/AISTATS21.pdf
Reference TypeJournal Article
Author(s)Klink, P.; Abdulsamad, H.; Belousov, B.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Year2021
TitleA Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Link to PDFhttp://arxiv.org/abs/2102.13176
Reference TypeConference Proceedings
Author(s)Tosatto, S.; Chalvatzaki, G.; Peters, J.
Year2021
TitleContextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://arxiv.org/pdf/2010.13766.pdf
Reference TypeConference Proceedings
Author(s)Li, Q.; Chalvatzaki, G.; Peters, J.; Wang, Y.
Year2021
TitleDirected Acyclic Graph Neural Network for Human Motion Prediction
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
URL(s) https://ieeexplore.ieee.org/document/9561540
Reference TypeConference Proceedings
Author(s)Prasad, V.; Stock-Homburg, R.; Peters, J.
Year2021
TitleLearning Human-like Hand Reaching for Human-Robot Handshaking
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/VigneshPrasad/ICRA22-Prasad.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Silberbauer, J.; Watson, J.; Peters, J.
Year2021
TitleDifferentiable Physics Models for Real-world Offline Model-based Reinforcement Learning
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/LutterICRA2021.pdf
Reference TypeConference Proceedings
Author(s)Morgan, A.; Nandha, D.; Chalvatzaki, G.; D'Eramo, C.; Dollar, A.; Peters, J.
Year2021
TitleModel Predictive Actor-Critic: Accelerating Robot Skill Acquisition with Deep Reinforcement Learning
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
URL(s) https://ieeexplore.ieee.org/document/9561298
Reference TypeConference Proceedings
Author(s)Abdulsamad, H.; Nickl, P.; Klink, P.; Peters, J.
Year2021
TitleA Variational Infinite Mixture for Probabilistic Inverse Dynamics Learning
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
URL(s) https://arxiv.org/pdf/2011.05217.pdf
Reference TypeConference Paper
Author(s)Dam, T.; D'Eramo, C.; Peters, J.; Pajarinen J.
Year2021
TitleConvex Regularization in Monte-Carlo Tree Search
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
AbstractMonte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. The recent AlphaGo and AlphaZero algorithms have shown how to successfully combine these two paradigms in order to solve large scale sequential decision problems. These methodologies exploit a variant of the well-known UCT algorithm to trade off exploitation of good actions and exploration of unvisited states, but their empirical success comes at the cost of poor sample-efficiency and high computation time. In this paper, we overcome these limitations by considering convex regularization in Monte-Carlo Tree Search (MCTS), which has been successfully used in RL to efficiently drive exploration. First, we introduce a unifying theory on the use of generic convex regularizers in MCTS, deriving the regret analysis and providing guarantees of exponential convergence rate. Second, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS, based on the relative entropy of the policy update, and on the Tsallis entropy of the policy. Finally, we empirically evaluate the proposed operators in AlphaGo and AlphaZero on problems of increasing dimensionality and branching factor, from a toy problem to several Atari games, showing their superiority wrt representative baselines.
Link to PDFhttps://arxiv.org/pdf/2007.00391.pdf
Reference TypeJournal Article
Author(s)Tanneberg, D.; Ploeger, K.; Rueckert, E.; Peters, J.
Year2021
TitleSKID RAW: Skill Discovery from Raw Trajectories
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttps://arxiv.org/pdf/2103.14610.pdf
Reference TypeConference Proceedings
Author(s)Abdulsamad, H.; Dorau, T.; Belousov, B.; Zhu, J.-J; Peters, J.
Year2021
TitleDistributionally Robust Trajectory Optimization Under Uncertain Dynamics via Relative Entropy Trust-Regions
Journal/Conference/Book TitlearXiv
Link to PDFhttps://arxiv.org/pdf/2103.15388.pdf
Reference TypeConference Proceedings
Author(s)Carvalho, J., Tateo, D., Muratore, F., Peters, J.
Year2021
TitleAn Empirical Analysis of Measure-Valued Derivatives for Policy Gradients
Journal/Conference/Book TitleInternational Joint Conference on Neural Networks (IJCNN)
KeywordsMeasure-Valued Derivative, Policy Gradient, Reinforcement Learning
AbstractReinforcement learning methods for robotics are increasingly successful due to the constant development of better policy gradient techniques. A precise (low variance) and accurate (low bias) gradient estimator is crucial to face increasingly complex tasks. Traditional policy gradient algorithms use the likelihood-ratio trick, which is known to produce unbiased but high variance estimates. More modern approaches exploit the reparametrization trick, which gives lower variance gradient estimates but requires differentiable value function approximators. In this work, we study a different type of stochastic gradient estimator: the Measure-Valued Derivative. This estimator is unbiased, has low variance, and can be used with differentiable and non-differentiable function approximators. We empirically evaluate this estimator in the actor-critic policy gradient setting and show that it can reach comparable performance with methods based on the likelihood-ratio or reparametrization tricks, both in low and high-dimensional action spaces.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoaoCarvalho/2021_ijcnn-mvd_rl.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Mannor, S.; Peters, J.; Fox, D.; Garg, A.
Year2021
TitleValue Iteration in Continuous Actions, States and Time
Journal/Conference/Book TitleInternational Conference on Machine Learning (ICML)
Link to PDFhttps://arxiv.org/pdf/2105.04682.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Mannor, S.; Peters, J.; Fox, D.; Garg, A.
Year2021
TitleRobust Value Iteration for Continuous Control Tasks
Journal/Conference/Book TitleRobotics: Science and Systems (RSS)
Link to PDFhttps://arxiv.org/pdf/2105.12189.pdf
Reference TypeJournal Article
Author(s)Funk, N.; Baumann, D.; Berenz, V.; Trimpe, S.
Year2021
TitleLearning event-triggered control from data through joint optimization
Journal/Conference/Book TitleIFAC Journal of Systems and Control
KeywordsEvent-triggered control, Reinforcement learning, Stability verification, Neural networks
AbstractWe present a framework for model-free learning of event-triggered control strategies. Event-triggered methods aim to achieve high control performance while only closing the feedback loop when needed. This enables resource savings, e.g., network bandwidth if control commands are sent via communication networks, as in networked control systems. Event-triggered controllers consist of a communication policy, determining when to communicate, and a control policy, deciding what to communicate. It is essential to jointly optimize the two policies since individual optimization does not necessarily yield the overall optimal solution. To address this need for joint optimization, we propose a novel algorithm based on hierarchical reinforcement learning. The resulting algorithm is shown to accomplish high-performance control in line with resource savings and scales seamlessly to nonlinear and high-dimensional systems. The method’s applicability to real-world scenarios is demonstrated through experiments on a six degrees of freedom real-time controlled manipulator. Further, we propose an approach towards evaluating the stability of the learned neural network policies.
Volume16
Pages100144
ISBN/ISSN2468-6018
URL(s) https://www.sciencedirect.com/science/article/pii/S2468601821000055
Electronic Resource Numberhttps://doi.org/10.1016/j.ifacsc.2021.100144
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/NiklasFunk/Learn_etc_v1.pdf
Reference TypeConference Proceedings
Author(s)Urain, J.; Li, A.; Liu, P.; D'Eramo, C.; Peters, J.
Year2021
TitleComposable Energy Policies for Reactive Motion Generation and Reinforcement Learning
Journal/Conference/Book TitleRobotics: Science and Systems (RSS)
AbstractReactive motion generation problems are usually solved by computing actions as a sum of policies. However, these policies are independent of each other and thus, they can have conflicting behaviors when summing their contributions together. We introduce Composable Energy Policies (CEP), a novel framework for modular reactive motion generation. CEPcomputes the control action by optimization over the product of a set of stochastic policies. This product of policies will provide a high probability to those actions that satisfy all the components and low probability to the others. Optimizing over the product of the policies avoids the detrimental effect of conflicting behaviors between policies choosing an action that satisfies all the objectives. Besides, we show that CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate, in a hierarchical fashion, any distribution as prior, from multimodal distributions to non-smooth distributions and learn a new policy given them.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JulenUrainDeJesus/cep_rss_2021.pdf
Reference TypeBook Section
Author(s)Hansel, K.; Moos, J.; Derstroff, C.
Year2021
TitleBenchmarking the Natural Gradient in Policy Gradient Methods and Evolution Strategies
Journal/Conference/Book TitleReinforcement Learning Algorithms: Analysis and Applications
PublisherSpringer
Pages69--84
URL(s) https://link.springer.com/chapter/10.1007/978-3-030-41188-6_7
Reference TypeConference Proceedings
Author(s)Lutter, M.; Clever, D.; Kirsten, R.; Listmann, K.; Peters, J.
Year2021
TitleBuilding Skill Learning Systems for Robotics
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Automation Science and Engineering (CASE)
URL(s) https://ieeexplore.ieee.org/document/9551562
Reference TypeConference Proceedings
Author(s)Knaust, M.; Koert, D.
Year2021
TitleGuided Robot Skill Learning: A User-Study on Learning Probabilistic Movement Primitives with Non-Experts
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsKOBO
URL(s) https://ieeexplore.ieee.org/document/9555785
Reference TypeJournal Article
Author(s)Lampariello, R.; Mishra, H.; Oumer, N.W.; Peters, J.
Year2021
TitleRobust Motion Prediction of a Free-Tumbling Satellite with On-Ground Experimental Validation
Journal/Conference/Book TitleJournal of Guidance, Control, and Dynamics
Volume44
Number10
Pages1777-1793
URL(s) https://arc.aiaa.org/doi/pdf/10.2514/1.G005745
Reference TypeConference Proceedings
Author(s)Liu, P.; Tateo, D.; Bou-Ammar, H.; Peters, J.
Year2021
TitleEfficient and Reactive Planning for High Speed Robot Air Hockey
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
AbstractHighly dynamic robotic tasks require high-speed and reactive robots. These tasks are particularly challenging due to the physical constraints, hardware limitations, and the high uncertainty of dynamics and sensor measures. To face these issues, it's crucial to design robotics agents that generate precise and fast trajectories and react immediately to environmental changes. Air hockey is an example of this kind of task. Due to the environment's characteristics, it is possible to formalize the problem and derive clean mathematical solutions. For these reasons, this environment is perfect for pushing to the limit the performance of currently available general-purpose robotic manipulators. Using two Kuka Iiwa 14, we show how to design a policy for general-purpose robotic manipulators for the air hockey game. We demonstrate that a real robot arm can perform fast-hitting movements and that the two robots can play against each other on a medium-size air hockey table in simulation.
URL(s) https://sites.google.com/view/robot-air-hockey
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PuzeLiu/IROS_2021_Air_Hockey.pdf
Reference TypeConference Proceedings
Author(s)Muratore, F.; Gruner, T.; Wiese, F.; Belousov, B.; Gienger, M.; Peters, J.
Year2021
TitleNeural Posterior Domain Randomization
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Keywordssim-to-real, domain randomization, likelihood-free inference
AbstractCombining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_GWBPG--NeuralPosteriorDomainRandomization.pdf
LanguageEnglish
Last Modified Date2021-11-06
Reference TypeConference Proceedings
Author(s)Wibranek, B.; Liu, Y.; Funk, N.; Belousov, B.; Peters, J.; Tessmann, O.
Year2021
TitleReinforcement Learning for Sequential Assembly of SL-Blocks: Self-Interlocking Combinatorial Design Based on Machine Learning
Journal/Conference/Book TitleProceedings of the 39th eCAADe Conference
KeywordsSKILLS4ROBOTS
Link to PDFhttp://papers.cumincad.org/data/works/att/ecaade2021_247.pdf
Reference TypeJournal Article
Author(s)Bustamante, S.; Peters, J.; Schoelkopf, B.; Grosse-Wentrup, M.; Jayaram, V.
Year2021
TitleArmSym: a virtual human-robot interaction laboratory for assistive robotics
Journal/Conference/Book TitleIEEE Transactions on Human-Machine Systems
Volume51
Number6
Pages568-577
Link to PDFhttps://elib.dlr.de/147359/1/ArmSym_elib.pdf
Reference TypeJournal Article
Author(s)D`Eramo, C.; Tateo, D; Bonarini, A.; Restelli, M.; Peters, J.
Year2021
TitleMushroomRL: Simplifying Reinforcement Learning Research
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Volume22
Number131
Pages1-5
Link to PDFhttps://jmlr.org/papers/volume22/18-056/18-056.pdf
Reference TypeJournal Article
Author(s)D`Eramo, C.; Cini, A.; Nuara, A.; Pirotta, M.; Alippi, C.; Peters, J.; Restelli, M.
Year2021
TitleGaussian Approximation for Bias Reduction in Q-Learning
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Link to PDFhttps://www.jmlr.org/papers/volume22/20-633/20-633.pdf
Reference TypeConference Proceedings
Author(s)Funk, N.; Chalvatzaki, G.; Belousov, B.; Peters, J.
Year2021
TitleLearn2Assemble with Structured Representations and Search for Robotic Architectural Construction
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
KeywordsStructured representations, Autonomous assembly, Manipulation
AbstractAutonomous robotic assembly requires a well-orchestrated sequence of high-level actions and smooth manipulation executions. Learning to assemble complex 3D structures remains a challenging problem that requires drawing connections between target designs and building blocks, and creating valid assembly sequences considering structural stability and feasibility. To address the combinatorial complexity of the assembly tasks, we propose a multi-head attention graph representation that can be trained with reinforcement learning (RL) to encode the spatial relations and provide meaningful assembly actions. Combining structured representations with model-free RL and Monte-Carlo planning allows agents to operate with various target shapes and building block types. We design a hierarchical control framework that learns to sequence the building blocks to construct arbitrary 3D designs and ensures their feasibility, as we plan the geometric execution with the robot-in-the-loop. We demonstrate the flexibility of the proposed structured representation and our algorithmic solution in a series of simulated 3D assembly tasks with robotic evaluation, which showcases our method's ability to learn to construct stable structures with a large number of building blocks. Code and videos are available at: https://sites.google.com/view/learn2assemble
URL(s) https://sites.google.com/view/learn2assemble
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/NiklasFunk/L2A_v1.pdf
Reference TypeConference Proceedings
Author(s)Liu, P.; Tateo, D.; Bou-Ammar, H.; Peters, J.
Year2021
TitleRobot Reinforcement Learning on the Constraint Manifold
Journal/Conference/Book TitleProceedings of the Conference on Robot Learning (CoRL)
KeywordsRobot Learning, Constrained Reinforcement Learning, Safe Exploration
AbstractReinforcement learning in robotics is extremely challenging due to many practical issues, including safety, mechanical constraints, and wear and tear. Typically, these issues are not considered in the machine learning literature. One crucial problem in applying reinforcement learning in the real world is Safe Exploration, which requires physical and safety constraints satisfaction throughout the learning process. To explore in such a safety-critical environment, leveraging known information such as robot models and constraints is beneficial to provide more robust safety guarantees. Exploiting this knowledge, we propose a novel method to learn robotics tasks in simulation efficiently while satisfying the constraints during the learning process.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PuzeLiu/CORL_2021_Learning_on_the_Manifold.pdf
Reference TypeJournal Article
Author(s)Abdulsamad, H.; Peters, J.
Year2021
TitleModel-Based Reinforcement Learning for Stochastic Hybrid Systems
Journal/Conference/Book TitlearXiv
Link to PDFhttps://arxiv.org/pdf/2111.06211.pdf
Reference TypeBook Section
Author(s)Palenicek, D.
Year2021
TitleA Survey on Constraining Policy Updates Using the KL Divergence
Journal/Conference/Book TitleReinforcement Learning Algorithms: Analysis and Applications
Pages49-57
URL(s) https://link.springer.com/chapter/10.1007/978-3-030-41188-6_5
Reference TypeConference Proceedings
Author(s)Bauer, S.; Wüthrich, W.; Widmaier, F.; Buchholz, A.; Stark, S.; Goyal, A.; Steinbrenner, T.; Akpo, J.; Joshi, S.; Berenz, V.; Agrawal, V.; Funk, N.; Urain, J.; Peters, J.; Watson, J.; Et, A.L.l
Year2021
TitleReal Robot Challenge: A Robotics Competition in the Cloud
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
Link to PDFhttps://proceedings.mlr.press/v176/bauer22a/bauer22a.pdf
Reference TypeJournal Article
Author(s)Veiga, F. F.; Edin B.B; Peters, J.
Year2020
TitleGrip Stabilization through Independent Finger Tactile Feedback Control
Journal/Conference/Book TitleSensors (Special Issue on Sensors and Robot Control)
Volume20
Link to PDFhttps://www.mdpi.com/1424-8220/20/6/1748/pdf
Reference TypeJournal Article
Author(s)Vinogradska, J.; Bischoff, B.; Koller, T.; Achterhold, J.; Peters, J.
Year2020
TitleNumerical Quadrature for Probabilistic Policy Search
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)
Volume42
Number1
Pages164-175
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NuQuPS_preprint.pdf
Reference TypeJournal Article
Author(s)Lioutikov, R.; Maeda, G.; Veiga, F.F.; Kersting, K.; Peters, J.
Year2020
TitleLearning Attribute Grammars for Movement Primitive Sequencing
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
KeywordsSkillz4robots
Volume39
Number1
Pages21-38
Reference TypeJournal Article
Author(s)Arenz, O.; Zhong, M.; Neumann G.
Year2020
TitleTrust-Region Variational Inference with Gaussian Mixture Models
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Keywordsapproximate inference, variational inference, sampling, policy search, mcmc, markov chain monte carlo
AbstractMany methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/OlegArenz/VIPS_JMLR.pdf
Reference TypeJournal Article
Author(s)Gomez-Gonzalez, S.; Neumann, G.; Schölkopf, B.; Peters, J.
Year2020
TitleAdaptation and Robust Learning of Probabilistic Movement Primitives
Journal/Conference/Book TitleIEEE Transactions on Robotics (T-Ro)
Volume36
Number2
Pages366-379
Link to PDFhttps://arxiv.org/pdf/1808.10648.pdf
Reference TypeJournal Article
Author(s)Koert, D.; Trick, S.; Ewerton, M.; Lutter, M.; Peters, J.
Year2020
TitleIncremental Learning of an Open-Ended Collaborative Skill Library
Journal/Conference/Book TitleInternational Journal of Humanoid Robotics (IJHR)
KeywordsSKILLS4ROBOTS, KOBO
Volume17
Number1
URL(s) https://www.worldscientific.com/doi/10.1142/S0219843620500012
Reference TypeConference Proceedings
Author(s)Dam, T.; Klink, P.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Year2020
TitleGeneralized Mean Estimation in Monte-Carlo Tree Search
Journal/Conference/Book TitleProceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
AbstractWe consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Moreover, we discuss a heuristic approach to balance the greediness of backups by tuning the power mean operator according to the number of visits to each node. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. UCT.
URL(s) https://www.ijcai.org/Proceedings/2020/0332.pdf
Link to PDFhttps://www.ijcai.org/Proceedings/2020/0332.pdf
Reference TypeJournal Article
Author(s)Loeckel, S.; Peters, J.; van Vliet, P.
Year2020
TitleA Probabilistic Framework for Imitating Human Race Driver Behavior
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L), with Presentation at the IEEE International Conference on Robotics and Automation (ICRA)
Volume5
Number2
Link to PDFhttps://arxiv.org/pdf/2001.08255.pdf
Reference TypeConference Proceedings
Author(s)Motokura, K.; Takahashi, M.; Ewerton, M.; Peters, J.
Year2020
TitlePlucking Motions for Tea Harvesting Robots Using Probabilistic Movement Primitives
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (ICRA/RA-L), with Presentation at the IEEE International Conference on Robotics and Automation (ICRA)
Volume5
Number2
Pages2377-3766
URL(s) https://ieeexplore.ieee.org/document/9013082
Reference TypeConference Proceedings
Author(s)Zelch, C.; Peters, J.; von Stryk, O.
Year2020
TitleLearning Control Policies from Optimal Trajectories
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/Members/Zelch_ICRA_2020.pdf
Reference TypeJournal Article
Author(s)Gomez-Gonzalez, S.; Prokudin, S.; Schölkopf, B.; Peters, J.
Year2020
TitleReal Time Trajectory Prediction Using Deep Conditional Generative Models
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (ICRA/RA-L), with Presentation at the IEEE International Conference on Robotics and Automation (ICRA)
Volume5
Number2
Pages970-976
Link to PDFhttps://arxiv.org/pdf/1909.03895.pdf
Reference TypeConference Proceedings
Author(s)Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J.
Year2020
TitleA Nonparametric Off-Policy Policy Gradient
Journal/Conference/Book TitleProceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS)
Keywordsnonparametric, policy gradient, off policy, reinforcement learning
AbstractReinforcement learning (RL) algorithms still suffer from high sample complexity despite outstanding recent successes. The need for intensive interactions with the environment is especially observed in many widely popular policy gradient algorithms that perform updates using on-policy samples. The price of such inefficiency becomes evident in real-world scenarios such as interaction-driven robot learning, where the success of RL has been rather limited. We address this issue by building on the general sample efficiency of off-policy algorithms. With nonparametric regression and density estimation methods we construct a nonparametric Bellman equation in a principled manner, which allows us to obtain closed-form estimates of the value function, and to analytically express the full policy gradient. We provide a theoretical analysis of our estimate to show that it is consistent under mild smoothness assumptions and empirically show that our approach has better sample efficiency than state-of-the-art policy gradient methods.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SamueleTosatto/tosatto2020.pdf
Reference TypeConference Proceedings
Author(s)D`Eramo, C.; Tateo, D.; Bonarini, A.; Restelli, M.; Peters, J.
Year2020
TitleSharing Knowledge in Multi-Task Deep Reinforcement Learning
Journal/Conference/Book TitleInternational Conference in Learning Representations (ICLR)
Link to PDFhttps://openreview.net/pdf?id=rkgpv2VFvr
Reference TypeConference Proceedings
Author(s)Eilers, C.; Eschmann, J.; Menzenbach, R.; Belousov, B.; Muratore, F.; Peters, J.
Year2020
TitleUnderactuated Waypoint Trajectory Optimization for Light Painting Photography
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
KeywordsSKILLS4ROBOTS
AbstractDespite their abundance in robotics and nature, underactuated systems remain a challenge for control engineering. Trajectory optimization provides a generally applicable solution, however its efficiency strongly depends on the skill of the engineer to frame the problem in an optimizer-friendly way. This paper proposes a procedure that automates such problem reformulation for a class of tasks in which the desired trajectory is specified by a sequence of waypoints. The approach is based on introducing auxiliary optimization variables that represent waypoint activations. To validate the proposed method, a letter drawing task is set up where shapes traced by the tip of a rotary inverted pendulum are visualized using long exposure photography.
Custom 1https://www.youtube.com/watch?v=IiophaKtWG0&feature=youtu.be
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Eilers_Eschmann_Menzenbach_BMP--UnderactuatedWaypointTrajectoryOptimizationforLightPaintingPhotography.pdf
Reference TypeConference Proceedings
Author(s)Stock-Homburg, R.; Peters, J.; Schneider, K.; Prasad, V.; Nukovic, L.
Year2020
TitleEvaluation of the Handshake Turing Test for anthropomorphic Robots
Journal/Conference/Book TitleProceedings of the ACM/IEEE International Conference on Human Robot Interaction (HRI), Late Breaking Report
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/VigneshPrasad/HRI20-RSH.pdf
Reference TypeReport
Author(s)Tosatto, S.; Stadtmueller, J.; Peters, J.
Year2020
TitleDimensionality Reduction of Movement Primitives in Parameter Space
Journal/Conference/Book TitlearXiv
Keywordsmovement primitives, dimensionality reduction, imitation learning, robot learning
AbstractMovement primitives are an important policy class for real-world robotics. However, the high dimensionality of their parametrization makes the policy optimization expensive both in terms of samples and computation. Enabling an efficient representation of movement primitives facilitates the application of machine learning techniques such as reinforcement on robotics. Motions, especially in highly redundant kinematic structures, exhibit high correlation in the configuration space. For these reasons, prior work has mainly focused on the application of dimensionality reduction techniques in the configuration space. In this paper, we investigate the application of dimensionality reduction in the parameter space, identifying principal movements. The resulting approach is enriched with a probabilistic treatment of the parameters, inheriting all the properties of the Probabilistic Movement Primitives. We test the proposed technique both on a real robotic task and on a database of complex human movements. The empirical analysis shows that the dimensionality reduction in parameter space is more effective than in configuration space, as it enables the representation of the movements with a significant reduction of parameters.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SamueleTosatto/tosatto2020b.pdf
Reference TypeConference Proceedings
Author(s)Almeida Santos, A.; Gil, C.E.M.; Peters, J.; Steinke, F.
Year2020
TitleDecentralized Data-Driven Tuning of Droop Frequency Controllers
Journal/Conference/Book Title2020 IEEE PES Innovative Smart Grid Technologies Europe
Link to PDFhttps://www.eins.tu-darmstadt.de/fileadmin/user_upload/publications_pdf/20_ISGTEU_SanMorPetSte_paper.pdf
Reference TypeConference Proceedings
Author(s)Abdulsamad, H.; Peters, J.
Year2020
TitleHierarchical Decomposition of Nonlinear Dynamics and Control for System Identification and Policy Distillation
Journal/Conference/Book Title2nd Annual Conference on Learning for Dynamics and Control
Link to PDFhttps://arxiv.org/abs/2005.01432
Reference TypeJournal Article
Author(s)Ewerton, M.; Arenz, O.; Peters, J.
Year2020
TitleAssisted Teleoperation in Changing Environments with a Mixture of Virtual Guides
Journal/Conference/Book TitleAdvanced Robotics
Volume34
Number of Volumes18
Link to PDFhttps://arxiv.org/pdf/2008.05251.pdf
Reference TypeConference Proceedings
Author(s)Becker, P.; Arenz, O.; Neumann, G.
Year2020
TitleExpected Information Maximization: Using the I-Projection for Mixture Density Estimation
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
Link to PDF/uploads/Team/OlegArenz/beckerEIM.pdf
Reference TypeJournal Article
Author(s)Lauri, M.; Pajarinen, J.; Peters, J.; Frintrop, S.
Year2020
TitleMulti-Sensor Next-Best-View Planning as Matroid-Constrained Submodular Maximization
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
Volume5
Number4
Pages5323-5330
Link to PDFhttps://arxiv.org/pdf/2007.02084.pdf
Reference TypeJournal Article
Author(s)Agudelo-Espana, D.; Gomez-Gonzalez, S.; Bauer, S.; Schoelkopf, B.; Peters, J.
Year2020
TitleBayesian Online Prediction of Change Points
Journal/Conference/Book TitleConference on Uncertainty in Artificial Intelligence (UAI)
Link to PDFhttps://arxiv.org/pdf/1902.04524.pdf
Reference TypeConference Proceedings
Author(s)Laux, M.; Arenz, O.; Pajarinen, J.; Peters, J.
Year2020
TitleDeep Adversarial Reinforcement Learning for Object Disentangling
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2020)
Link to PDF/uploads/Site/EditPublication/Melvin_Iros.pdf
Reference TypeJournal Article
Author(s)Koert, D.; Kircher, M.; Salikutluk, V.; D'Eramo, C.; Peters, J.
Year2020
TitleMulti-Channel Interactive Reinforcement Learning for Sequential Tasks
Journal/Conference/Book TitleFrontiers in Robotics and AI Human-Robot Interaction
KeywordsSKILLS4ROBOTS, KOBO
URL(s) https://www.frontiersin.org/articles/10.3389/frobt.2020.00097/full
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DorotheaKoert/multi_channel_feedback_rl_sequential.pdf
Reference TypeReport
Author(s)Arenz, O.; Neumann, G.
Year2020
TitleNon-Adversarial Imitation Learning and its Connections to Adversarial Methods
Journal/Conference/Book TitlearXiv
KeywordsImitation Learning, Inverse Reinforcement Learning, Non-Adversarial Imitation Learning, Adversarial Imitation Learning, AIRL
AbstractMany modern methods for imitation learning and inverse reinforcement learning, such as GAIL or AIRL, are based on an adversarial formulation. These methods apply GANs to match the expert's distribution over states and actions with the implicit state-action distribution induced by the agent's policy. However, by framing imitation learning as a saddle point problem, adversarial methods can suffer from unstable optimization, and convergence can only be shown for small policy updates. We address these problems by proposing a framework for non-adversarial imitation learning. The resulting algorithms are similar to their adversarial counterparts and, thus, provide insights for adversarial imitation learning methods. Most notably, we show that AIRL is an instance of our non-adversarial formulation, which enables us to greatly simplify its derivations and obtain stronger convergence guarantees. We also show that our non-adversarial formulation can be used to derive novel algorithms by presenting a method for offline imitation learning that is inspired by the recent ValueDice algorithm, but does not rely on small policy updates for convergence. In our simulated robot experiments, our offline method for non-adversarial imitation learning seems to perform best when using many updates for policy and discriminator at each iteration and outperforms behavioral cloning and ValueDice.
Link to PDF/uploads/Team/OlegArenz/nail_arxiv.pdf
Reference TypeUnpublished Work
Author(s)Abi-Farraj, F.; Pacchierotti, C.; Arenz, O.; Neumann, G.; Giordano, P.
Year2020
TitleHaptic-based Guided Grasping in a Cluttered Environment
Journal/Conference/Book TitleIEEE Haptics Symposium
Link to PDFhttps://www.roboticvision.org/wp-content/uploads/Haptic-based-Guided-Grasping-in-a-Cluttered-Environment.pdf
Reference TypeConference Proceedings
Author(s)Keller, L.; Tanneberg, D.; Stark, S.; Peters, J.
Year2020
TitleModel-Based Quality-Diversity Search for Efficient Robot Learning
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttps://arxiv.org/pdf/2008.04589.pdf
Reference TypeConference Proceedings
Author(s)Klink, P.; D'Eramo, C.; Peters, J.; Pajarinen, J.
Year2020
TitleSelf-Paced Deep Reinforcement Learning
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PascalKlink/neurips-2020-2.pdf
Reference TypeConference Proceedings
Author(s)Ploeger, K.; Lutter, M.; Peters, J.
Year2020
TitleHigh Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Link to PDFhttps://arxiv.org/pdf/2010.13483.pdf
Reference TypeConference Proceedings
Author(s)Urain, J.; Ginesi, M.; Tateo, D.; Peters, J.
Year2020
TitleImitationFlow: Learning Deep Stable Stochastic Dynamic Systems by Normalizing Flows
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems
KeywordsMovement Primitives, Imitation Learning
AbstractWe introduce ImitationFlow, a novel Deep generative model that allows learning complex globally stable, stochastic, nonlinear dynamics. Our approach extends the Normalizing Flows framework to learn stable Stochastic Differential Equations. We prove the Lyapunov stability for a class of Stochastic Differential Equations and we propose a learning algorithm to learn them from a set of demonstrated trajectories. Our model extends the set of stable dynamical systems that can be represented by state-of-the-art approaches, eliminates the Gaussian assumption on the demonstrations, and outperforms the previous algorithms in terms of representation accuracy. We show the effectiveness of our method with both standard datasets and a real robot experiment.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JulenUrainDeJesus/2020iflowurain.pdf
Reference TypeConference Proceedings
Author(s)Abdulsamad, H.; Peters, J.
Year2020
TitleLearning Hybrid Dynamics and Control
Journal/Conference/Book TitleECML/PKDD Workshop on Deep Continuous-Discrete Machine Learning
Reference TypeJournal Article
Author(s)Tanneberg, D.; Rueckert, E.; Peters, J.
Year2020
TitleEvolutionary Training and Abstraction Yields Algorithmic Generalization of Neural Computers
Journal/Conference/Book TitleNature Machine Intelligence
KeywordsGOAL-Robots, SKILLS4ROBOTS
Volume2
Number12
Pages753-763
URL(s) https://rdcu.be/caRlg
Link to PDFhttps://arxiv.org/pdf/2105.07957.pdf
Reference TypeJournal Article
Author(s)Veiga, F. F.; Akrour, R.; Peters, J.
Year2020
TitleHierarchical Tactile-Based Control Decomposition of Dexterous In-Hand Manipulation Tasks
Journal/Conference/Book TitleFrontiers in Robotics and AI
URL(s) https://www.frontiersin.org/articles/10.3389/frobt.2020.521448/full
Reference TypeConference Proceedings
Author(s)Urain, J.; Tateo, D.; Ren, T.; Peters, J.
Year2020
TitleStructured policy representation: Imposing stability in arbitrarily conditioned dynamic systems
Journal/Conference/Book TitleNeurIPS 2020, 3rd Robot Learning Workshop
KeywordsMovement Primitives, Imitation Learning, Inductive Bias
AbstractWe present a new family of deep neural network-based dynamic systems. The presented dynamics are globally stable and can be conditioned with an arbitrary context state. We show how these dynamics can be used as structured robot policies. Global stability is one of the most important and straightforward inductive biases as it allows us to impose reasonable behaviors outside the region of the demonstrations.
Pages7
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JulenUrainDeJesus/2020_structuredpolicy_urain.pdf
Reference TypeConference Proceedings
Author(s)Watson, J.; Imohiosen A.; Peters, J.
Year2020
TitleActive Inference or Control as Inference? A Unifying View
Journal/Conference/Book TitleInternational Workshop on Active Inference
Reference TypeConference Proceedings
Author(s)Prasad, V.; Stock-Homburg, R.; Peters, J.
Year2020
TitleAdvances in Human-Robot Handshaking
Journal/Conference/Book TitleInternational Conference on Social Robotics
PublisherSpringer
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/VigneshPrasad/ICSR21-Prasad.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Clever, D.; Belousov, B.; Listmann, K.; Peters, J.
Year2020
TitleEvaluating the Robustness of HJB Optimal Feedback Control
Journal/Conference/Book TitleInternational Symposium on Robotics
Reference TypeConference Paper
Author(s)Rother, D., Haider, T., & Eger, S
Year2020
TitleCMCE at SemEval-2020 Task 1: Clustering on Manifolds of Contextualized Embeddings to Detect Historical Meaning Shifts
Journal/Conference/Book Title14th International Workshop on Semantic Evaluation (SemEval)
KeywordsNatural Language Processing, Unsupervised Clustering, Semantic Shift Detection, Semantic Evaluation
AbstractThis paper describes the system Clustering on Manifolds of Contextualized Embeddings (CMCE) submitted to the SemEval-2020 Task 1 on Unsupervised Lexical Semantic Change Detection. Subtask 1 asks to identify whether or not a word gained/lost a sense across two time periods. Subtask 2 is about computing a ranking of words according to the amount of change their senses underwent. Our system uses contextualized word embeddings from MBERT, whose dimensionality we reduce with an autoencoder and the UMAP algorithm, to be able to use a wider array of clustering algorithms that can automatically determine the number of clusters. We use Hierarchical Density Based Clustering (HDBSCAN) and compare it to Gaussian Mixture Models (GMMs) and other clustering algorithms. Remarkably, with only 10 dimensional MBERT embeddings (reduced from the original size of 768), our submitted model performs best on subtask 1 for English and ranks third in subtask 2 for English. In addition to describing our system, we discuss our hyperparameter configurations and examine why our system lags behind for the other languages involved in the shared task (German, Swedish, Latin). Our code is available at https://github.com/DavidRother/semeval2020-task1
Pages187-193
Link to PDFhttps://pure.mpg.de/rest/items/item_3278784/component/file_3278792/content
Reference TypeConference Proceedings
Author(s)Lutter, M.; Silberbauer, J.; Watson, J.; Peters, J.
Year2020
TitleA Differentiable Newton Euler Algorithm for Multi-body Model Learning
Journal/Conference/Book TitleR:SS Structured Approaches to Robot Learning Workshop
Reference TypeJournal Article
Author(s)Tanneberg, D.; Peters, J.; Rueckert, E.
Year2019
TitleIntrinsic Motivation and Mental Replay enable Efficient Online Adaptation in Stochastic Recurrent Networks
Journal/Conference/Book TitleNeural Networks
KeywordsGOAL-Robots, SKILLS4ROBOTS
Volume109
Pages67-80
URL(s) https://doi.org/10.1016/j.neunet.2018.10.005
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DanielTanneberg/tanneberg_NN18.pdf
Reference TypeJournal Article
Author(s)Koc, O.; Peters, J.
Year2019
TitleLearning to serve: an experimental study for a new learning from demonstrations framework
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (ICRA/RA-L), with Presentation at the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Learning_to_Serve_2019.pdf
Reference TypeConference Proceedings
Author(s)Lauri, M.; Pajarinen, J.; Peters, J.
Year2019
TitleInformation gathering in decentralized POMDPs by policy graph improvement
Journal/Conference/Book TitleProceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS)
KeywordsROBOLEAP,SKILLS4ROBOTS
Link to PDFhttps://arxiv.org/pdf/1902.09840
Reference TypeJournal Article
Author(s)Brandherm, F.; Peters, J.; Neumann, G.; Akrour, R.
Year2019
TitleLearning Replanning Policies with Direct Policy Search
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/RiadAkrour/florian_ral_sub.pdf
Reference TypeJournal Article
Author(s)Gebhardt, G.H.W.; Kupcsik, A.; Neumann, G.
Year2019
TitleThe Kernel Kalman Rule
Journal/Conference/Book TitleMachine Learning Journal (MLJ)
PublisherSpringer US
Volume108
Number12
Pages2113–2157
ISBN/ISSN0885-6125
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/GregorGebhardt/TheKernelKalmanRuleJournal.pdf
Reference TypeJournal Article
Author(s)Parisi, S.; Tangkaratt, V.; Peters, J.; Khan, M. E.
Year2019
TitleTD-Regularized Actor-Critic Methods
Journal/Conference/Book TitleMachine Learning (MLJ)
Volume108
Number8
Pages1467-1501
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SimoneParisi/parisi2019mlj.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Ritter, C.; Peters, J.
Year2019
TitleDeep Lagrangian Networks: Using Physics as Model Prior for Deep Learning
Journal/Conference/Book TitleInternational Conference on Learning Representations (ICLR)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/MichaelLutter/lutter_iclr_2019.pdf
Reference TypeJournal Article
Author(s)Koc, O.; Maeda, G.; Peters, J.
Year2019
TitleOptimizing the Execution of Dynamic Robot Movements with Learning Control
Journal/Conference/Book TitleIEEE Transactions on Robotics
Volume35
Number4
Pages1552-3098
Link to PDFhttps://arxiv.org/pdf/1807.01918.pdf
Reference TypeConference Proceedings
Author(s)Tosatto, S.; D'Eramo, C.; Pajarinen, J.; Restelli, M.; Peters, J.
Year2019
TitleExploration Driven By an Optimistic Bellman Equation
Journal/Conference/Book TitleProceedings of the International Joint Conference on Neural Networks (IJCNN)
Keywordsexploration; reinforcement learning; intrinsic motivation; Bosch-Forschungstiftung
AbstractExploring high-dimensional state spaces and finding sparse rewards are central problems in reinforcement learning. Exploration strategies are frequently either naı̈ve (e.g., simplistic epsilon-greedy or Boltzmann policies), intractable (i.e., full Bayesian treatment of reinforcement learning) or rely heavily on heuristics. The lack of a tractable but principled exploration approach unnecessarily complicates the application of reinforcement learning to a broader range of problems. Efficient exploration can be accomplished by relying on the uncertainty of the state-action value function. To obtain the uncertainty, we maintain an ensemble of value function estimates and present an optimistic Bellman equation (OBE) for such ensembles. This OBE is derived from a relative entropy maximization principle and yields an implicit exploration bonus resulting in improved exploration during action selection. The implied exploration bonus can be seen as a well-principled type of intrinsic motivation and exhibits favorable theoretical properties. OBE can be applied to a wide range of algorithms. We propose two algorithms as an application of the principle: Optimistic Q-learning and Optimistic DQN which outperform comparison methods on standard benchmarks.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SamueleTosatto/TosattoIJCNN2019.pdf
LanguageEnglish
Reference TypeConference Proceedings
Author(s)Wibranek, B.; Belousov, B.; Sadybakasov, A.; Tessmann, O.
Year2019
TitleInteractive Assemblies: Man-Machine Collaboration through Building Components for As-Built Digital Models
Journal/Conference/Book TitleComputer-Aided Architectural Design Futures (CAAD Futures)
KeywordsSKILLS4ROBOTS
Link to PDF/uploads/Team/BorisBelousov/wibranek_caad19.pdf
Reference TypeJournal Article
Author(s)Abi Farraj, F.; Pacchierotti, C.; Arenz, O.; Neumann, G.; Giordano, P.
Year2019
TitleA Haptic Shared-Control Architecture for Guided Multi-Target Robotic Grasping
Journal/Conference/Book TitleIEEE Transactions on Haptics
KeywordsGrasping, Task analysis, Manipulators, Grippers, Service robots
AbstractAlthough robotic telemanipulation has always been a key technology for the nuclear industry, little advancement has been seen over the last decades. Despite complex remote handling requirements, simple mechanically-linked master-slave manipulators still dominate the field. Nonetheless, there is a pressing need for more effective robotic solutions able to significantly speed up the decommissioning of legacy radioactive waste. This paper describes a novel haptic shared-control approach for assisting a human operator in the sort and segregation of different objects in a cluttered and unknown environment. A 3D scan of the scene is used to generate a set of potential grasp candidates on the objects at hand. These grasp candidates are then used to generate guiding haptic cues, which assist the operator in approaching and grasping the objects. The haptic feedback is designed to be smooth and continuous as the user switches from a grasp candidate to the next one, or from one object to another one, avoiding any discontinuity or abrupt changes. To validate our approach, we carried out two human-subject studies, enrolling 15 participants. We registered an average improvement of 20.8%, 20.1%, 32.5% in terms of completion time, linear trajectory, and perceived effectiveness, respectively, between the proposed approach and standard teleoperation.
URL(s) https://ieeexplore.ieee.org/document/8700204
Link to PDFhttps://inria.hal.science/hal-02113206/file/abi_farraj-TOH-sharedcontrol-grasping.pdf
Reference TypeConference Proceedings
Author(s)Akrour, R.; Pajarinen, J.; Neumann, G.; Peters, J.
Year2019
TitleProjections for Approximate Policy Iteration Algorithms
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
KeywordsROBOLEAP,SKILLS4ROBOTS
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/RiadAkrour/papi.pdf
Reference TypeConference Proceedings
Author(s)Becker-Ehmck, P.; Peters, J.; van der Smagt, P.
Year2019
TitleSwitching Linear Dynamics for Variational Bayes Filtering
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
Link to PDFhttps://arxiv.org/pdf/1905.12434.pdf
Reference TypeConference Proceedings
Author(s)Belousov, B.; Abdulsamad, H.; Schultheis, M.; Peters, J.
Year2019
TitleBelief Space Model Predictive Control for Approximately Optimal System Identification
Journal/Conference/Book Title4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
KeywordsSKILLS4ROBOTS
AbstractThe fundamental problem of reinforcement learning is to control a dynamical system whose properties are not fully known in advance. Many articles nowadays are addressing the issue of optimal exploration in this setting by investigating the ideas such as curiosity, intrinsic motivation, empowerment, and others. Interestingly, closely related questions of optimal input design with the goal of producing the most informative system excitation have been studied in adjacent fields grounded in statistical decision theory. In most general terms, the problem faced by a curious reinforcement learning agent can be stated as a sequential Bayesian optimal experimental design problem. It is well known that finding an optimal feedback policy for this type of setting is extremely hard and analytically intractable even for linear systems due to the non-linearity of the Bayesian filtering step. Therefore, approximations are needed. We consider one type of approximation based on replacing the feedback policy by repeated trajectory optimization in the belief space. By reasoning about the future uncertainty over the internal world model, the agent can decide what actions to take at every moment given its current belief and expected outcomes of future actions. Such approach became computationally feasible relatively recently, thanks to advances in automatic differentiation. Being straightforward to implement, it can serve as a strong baseline for exploration algorithms in continuous robotic control tasks. Preliminary evaluations on a physical pendulum with unknown system parameters indicate that the proposed approach can infer the correct parameter values quickly and reliably, outperforming random excitation and naive sinusoidal excitation signals, and matching the performance of the best manually designed system identification controller based on the knowledge of the system dynamics.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/rldm19_belousov.pdf
Reference TypeJournal Article
Author(s)Belousov, B.; Peters, J.
Year2019
TitleEntropic Regularization of Markov Decision Processes
Journal/Conference/Book TitleEntropy
KeywordsSKILLS4ROBOTS
PublisherMDPI
Volume21
Number7
ISBN/ISSN1099-4300
URL(s) https://www.mdpi.com/1099-4300/21/7/674
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/entropy19_belousov.pdf
Reference TypeJournal Article
Author(s)Pajarinen, J.; Thai, H.L.; Akrour, R.; Peters, J.; Neumann, G.
Year2019
TitleCompatible natural gradient policy search
Journal/Conference/Book TitleMachine Learning (MLJ)
KeywordsROBOLEAP,SKILLS4ROBOTS
PublisherSpringer
Volume108
Number8
Pages1443--1466
DateSeptember 2019
Link to PDFhttps://link.springer.com/content/pdf/10.1007%2Fs10994-019-05807-0.pdf
Reference TypeJournal Article
Author(s)Celemin, C.; Maeda, G.; Peters, J.; Ruiz-del-Solar, J.; Kober, J.
Year2019
TitleReinforcement Learning of Motor Skills using Policy Search and Human Corrective Advice
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Volume38
Number14
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Alumni/JensKober/IJRR__Revision_.pdf
Reference TypeConference Proceedings
Author(s)Nass, D.; Belousov, B.; Peters, J.
Year2019
TitleEntropic Risk Measure in Policy Search
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsSKILLS4ROBOTS
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/iros19_nass_v2.pdf
Reference TypeConference Proceedings
Author(s)Ozdenizci, O.; Meyer, T.; Wichmann, F.; Peters, J.; Schoelkopf B.; Cetin, M.; Grosse-Wentrup, M.
Year2019
TitleNeural Signatures of Motor Skill in the Resting Brain
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC)
Link to PDFhttps://arxiv.org/pdf/1907.09533.pdf
Reference TypeConference Proceedings
Author(s)Urain, J.; Peters, J.
Year2019
TitleGeneralized Multiple Correlation Coefficient as a Similarity Measurement between Trajectories
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Keywords2019
AbstractSimilarity distance measure between two trajectories is an essential tool to understand patterns in motion, for example, in Human-Robot Interaction or Imitation Learning. The problem has been faced in many fields, from Signal Processing, Probabilistic Theory field, Topology field or Statistics field. Anyway, up to now, none of the trajectory similarity measurements metrics are invariant to all possible linear transformation of the trajectories~(rotation, scaling, reflection, shear mapping or squeeze mapping). Also not all of them are robust in front of noisy signals or fast enough for real-time trajectory classification. To overcome this limitation this paper proposes a similarity distance metric that will remain invariant in front of any possible linear transformation. Based on Pearson's Correlation Coefficient and the Coefficient of Determination, our similarity metric, the Generalized Multiple Correlation Coefficient~(GMCC) is presented like the natural extension of the Multiple Correlation Coefficient. The motivation of this paper is two-fold: First, to introduce a new correlation metric that presents the best properties to compute similarities between trajectories invariant to linear transformations and compare it with some state of the art similarity distances. Second, to present a natural way of integrating the similarity metric in an Imitation Learning scenario for clustering robot trajectories.
Place PublishedIROS 2019
Date2019
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JulenUrainDeJesus/julen_IROS_2019.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Peters, J.
Year2019
TitleDeep Lagrangian Networks for end-to-end learning of energy-based control for under-actuated systems
Journal/Conference/Book TitleInternational Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/MichaelLutter/IROS_2019_Final_DeLaN_Energy_Control.pdf
Reference TypeJournal Article
Author(s)Koert, D.; Pajarinen, J.; Schotschneider, A.; Trick, S., Rothkopf, C.; Peters, J.
Year2019
TitleLearning Intention Aware Online Adaptation of Movement Primitives
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L), with presentation at the IEEE International Conference on Intelligent Robots and Systems (IROS)
KeywordsSKILLS4ROBOTS, KOBO
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DorotheaKoert/final_ral_2019_koert.pdf
Reference TypeConference Proceedings
Author(s)Celik, O.; Abdulsamad, H.; Peters, J.
Year2019
TitleChance-Constrained Trajectory Optimization for Nonlinear Systems with Unknown Stochastic Dynamics
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
AbstractIterative trajectory optimization techniques for non-linear dynamical systems are among the most powerful and sample-efficient methods of model-based reinforcement learning and approximate optimal control. By leveraging time-variant local linear-quadratic approximations of system dynamics and rewards, such methods are able to find both a target-optimal trajectory and time-variant optimal feedback controllers. How- ever, the local linear-quadratic approximations are a major source of optimization bias that leads to catastrophic greedy updates, raising the issue of proper regularization. Moreover, the approximate models’ disregard for any physical state-action limits of the system, causes further aggravation of the problem, as the optimization moves towards unreachable areas of the state-action space. In this paper, we address these drawbacks in the scenario of online-fitted stochastic dynamics. We propose modeling state and action physical limits as probabilistic chance constraints and introduce a new trajectory optimization technique that integrates such probabilistic constraints by opti- mizing a relaxed quadratic program. Our empirical evaluations show a significant improvement in the robustness of the learning process, which enables our approach to perform more effective updates, and avoid premature convergence observed in other state-of-the-art techniques.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/HanyAbdulsamad/celik2019chance.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Peters, J.
Year2019
TitleDeep Optimal Control: Using the Euler-Lagrange Equation to learn an Optimal Feedback Control Law
Journal/Conference/Book TitleMulti-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/MichaelLutter/RLDM2019_Deep_Optimal_Control.pdf
Reference TypeConference Proceedings
Author(s)Trick, S.; Koert, D.; Peters, J.; Rothkopf, C.
Year2019
TitleMultimodal Uncertainty Reduction for Intention Recognition in Human-Robot Interaction
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsSKILLS4ROBOTS, KOBO
Link to PDFhttps://arxiv.org/pdf/1907.02426.pdf
Reference TypeConference Proceedings
Author(s)Stark, S.; Peters, J.; Rueckert, E.
Year2019
TitleExperience Reuse with Probabilistic Movement Primitives
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsGOAL-Robots
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SvenjaStark/stark_iros2019_update.pdf
Reference TypeConference Proceedings
Author(s)Liu, Z.; Hitzmann, A.; Ikemoto, S.; Stark, S.; Peters, J.; Hosoda, K.
Year2019
TitleLocal Online Motor Babbling: Learning Motor Abundance of a Musculoskeletal Robot Arm
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
AbstractMotor babbling and goal babbling has been used for sensorimotor learning of highly redundant systems in soft robotics. Recent works in goal babbling has demonstrated successful learning of inverse kinematics (IK) on such systems, and suggests that babbling in the goal space better resolves motor redundancy by learning as few yet efficient sensorimotor mappings as possible. However, for musculoskeletal robot systems, motor redundancy can provide useful information to explain muscle activation patterns, thus the term motor abundance. In this work, we introduce some simple heuristics to empirically define the unknown goal space, and learn the IK of a 10 DoF musculoskeletal robot arm using directed goal babbling. We then further propose local online motor babbling guided by Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which bootstraps on the goal babbling samples for initialization, such that motor abundance can be queried online for any static goal. Our approach leverages the resolving of redundancies and the efficient guided exploration of motor abundance in two stages of learning, allowing both kinematic accuracy and motor variability at the queried goal. The result shows that local online motor babbling guided by CMA-ES can efficiently explore motor abundance on musculoskeletal robot systems and gives useful insights in terms of muscle stiffness and synergy.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SvenjaStark/Liu_IROS_2019.pdf
Reference TypeConference Proceedings
Author(s)Belousov, B.; Sadybakasov, A.; Wibranek, B.; Veiga, F.; Tessmann, O.; Peters, J.
Year2019
TitleBuilding a Library of Tactile Skills Based on FingerVision
Journal/Conference/Book TitleProceedings of the 2019 IEEE-RAS 19th International Conference on Humanoid Robots (Humanoids)
KeywordsSKILLS4ROBOTS
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/belousov19_fingervision.pdf
Reference TypeConference Proceedings
Author(s)Schultheis, M.; Belousov, B.; Abdulsamad, H.; Peters, J.
Year2019
TitleReceding Horizon Curiosity
Journal/Conference/Book TitleProceedings of the 3rd Conference on Robot Learning (CoRL)
KeywordsSKILLS4ROBOTS
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/schultheis19_rhc.pdf
Reference TypeConference Proceedings
Author(s)Lutter, M.; Belousov, B.; Listmann, K.; Clever, D.; Peters, J.
Year2019
TitleHJB Optimal Feedback Control with Deep Differential Value Functions and Action Constraints
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/MichaelLutter/CoRL2019_Deep_Optimal_HJB_Control.pdf
Reference TypeJournal Article
Author(s)Ewerton, M.; Arenz, O.; Maeda, G.; Koert, D.; Kolev, Z.; Takahashi, M.; Peters, J.
Year2019
TitleLearning Trajectory Distributions for Assisted Teleoperation and Path Planning
Journal/Conference/Book TitleFrontiers in Robotics and AI
URL(s) https://www.frontiersin.org/articles/10.3389/frobt.2019.00089/full
Link to PDFhttps://www.frontiersin.org/articles/10.3389/frobt.2019.00089/full
Reference TypeConference Proceedings
Author(s)Ewerton, M.; Maeda, G.; Koert, D.; Kolev, Z.; Takahashi, M.; Peters, J.
Year2019
TitleReinforcement Learning of Trajectory Distributions: Applications in Assisted Teleoperation and Motion Planning
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
AbstractThe majority of learning from demonstration approaches do not address suboptimal demonstrations or cases when drastic changes in the environment occur after the demonstrations were made. For example, in real teleoperation tasks, the demonstrations provided by the user are often suboptimal due to interface and hardware limitations. In tasks involving co-manipulation and manipulation planning, the environment often changes due to unexpected obstacles rendering previous demonstrations invalid. This paper presents a reinforcement learning algorithm that exploits the use of relevance functions to tackle such problems. This paper introduces the Pearson correlation as a measure of the relevance of policy parameters in regards to each of the components of the cost function to be optimized. The method is demonstrated in a static environment where the quality of the teleoperation is compromised by the visual interface (operating a robot in a three-dimensional task by using a simple 2D monitor). Afterward, we tested the method on a dynamic environment using a real 7-DoF robot arm where distributions are computed online via Gaussian Process regression.
Place PublishedMacau, China
Pages4294--4300
DateNovember 4-8, 2019
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Member/PubMarcoEwerton/Ewerton_IROS_2019.pdf
Reference TypeConference Proceedings
Author(s)Wibranek, B.; Belousov, B.; Sadybakasov, A.; Peters, J.; Tessmann, O.
Year2019
TitleInteractive Structure: Robotic Repositioning of Vertical Elements in Man-Machine Collaborative Assembly through Vision-Based Tactile Sensing
Journal/Conference/Book TitleProceedings of the 37th eCAADe and 23rd SIGraDi Conference
KeywordsSKILLS4ROBOTS
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/wibranek_sigradi19.pdf
Reference TypeConference Proceedings
Author(s)Klink, P.; Abdulsamad, H.; Belousov, B.; Peters, J.
Year2019
TitleSelf-Paced Contextual Reinforcement Learning
Journal/Conference/Book TitleProceedings of the 3rd Conference on Robot Learning (CoRL)
AbstractGeneralization and adaptation of learned skills to novel situations is a core requirement for intelligent autonomous robots. Although contextual reinforcement learning provides a principled framework for learning and generalization of behaviors across related tasks, it generally relies on uninformed sampling of environments from an unknown, uncontrolled context distribution, thus missing the benefits of structured, sequential learning. We introduce a novel relative entropy reinforcement learning algorithm that gives the agent the freedom to control the intermediate task distribution, allowing for its gradual progression towards the target context distribution. Empirical evaluation shows that the proposed curriculum learning scheme drastically improves sample efficiency and enables learning in scenarios with both broad and sharp target context distributions in which classical approaches perform sub-optimally.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/PascalKlink/sprl.pdf
Reference TypeConference Paper
Author(s)Watson, J.; Abdulsamad, H.; Peters, J.
Year2019
TitleStochastic Optimal Control as Approximate Input Inference
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JoeWatson/Watson19I2c.pdf
Reference TypeConference Paper
Author(s)Abdulsamad, H.; Naveh, K.; Peters, J.
Year2019
TitleModel-Based Relative Entropy Policy Search for Stochastic Hybrid Systems
Journal/Conference/Book Title4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Reference TypeJournal Article
Author(s)Gomez Gonzalez, S.; Nemmour, Y.; Schoelkopf, B.; Peters, J.
Year2019
TitleReliable Real Time Ball Tracking for Robot Table Tennis
Journal/Conference/Book TitleRobotics
Volume8
Number of Volumes90
Number4
URL(s) https://www.mdpi.com/2218-6581/8/4/90
Reference TypeJournal Article
Author(s)Schuermann, T.; Mohler, B.J.; Peters, J.; Beckerle, P.
Year2019
TitleHow Cognitive Models of Human Body Experience Might Push Robotics
Journal/Conference/Book TitleFrontiers in Neurorobotics
Reference TypeConference Proceedings
Author(s)Delfosse, Q.; Stark, S.; Tanneberg, D.; Santucci, V. G.; Peters, J.
Year2019
TitleOpen-Ended Learning of Grasp Strategies using Intrinsically Motivated Self-Supervision
Journal/Conference/Book TitleWorkshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/SvenjaStark/delfosse_iros2019.pdf
Reference TypeConference Paper
Author(s)Muratore, F.; Gienger, M.; Peters, J.
Year2019
TitleAssessing Transferability in Reinforcement Learning from Randomized Simulations
Journal/Conference/Book TitleReinforcement Learning and Decision Making (RLDM)
Keywordsdomain randomization, simulation optimization, sim-2-real
AbstractExploration-based reinforcement learning of control policies on physical systems is generally time-intensive and can lead to catastrophic failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfortunately, running policy search on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB), where the policy exploits modeling errors of the simulator such that the resulting behavior can potentially damage the device. For this reason, much work in reinforcement learning has focused on model-free methods. The resulting lack of safe simulation-based policy learning techniques imposes severe limitations on the application of reinforcement learning to real-world systems. In this paper, we explore how physics simulations can be utilized for a robust policy optimization by randomizing the simulator’s parameters and training from model ensembles. We propose an algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) that uses an estimator of the SOB to formulate a stopping criterion for training. We show that the simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator that can be applied directly to a different system without using any data from the latter.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_GP--RLDM_2019.pdf
LanguageEnglish
Reference TypeConference Proceedings
Author(s)Look, A.; Kandemir, M.
Year2019
TitleDifferential Bayesian Neural Nets
Journal/Conference/Book TitleNeurIPS Bayesian Workshop
AbstractNeural Ordinary Differential Equations (N-ODEs) are a powerful building block for learning systems, which extend residual networks to a continuous-time dynamical system. We propose a Bayesian version of N-ODEs that enables well-calibrated quantification of prediction uncertainty, while maintaining the expressive power of their deterministic counterpart. We assign Bayesian Neural Nets (BNNs) to both the drift and the diffusion terms of a Stochastic Differential Equation (SDE) that models the flow of the activation map in time. We infer the posterior on the BNN weights using a straightforward adaptation of Stochastic Gradient Langevin Dynamics (SGLD). We illustrate significantly improved stability on two synthetic time series prediction tasks and report better model fit on UCI regression benchmarks with our method when compared to its non-Bayesian counterpart.
Reference TypeReport
Author(s)Tanneberg, D.; Rueckert, E.; Peters, J.
Year2019
TitleLearning Algorithmic Solutions to Symbolic Planning Tasks with a Neural Computer Architecture
Journal/Conference/Book TitlearXiv
KeywordsGOAL-Robots, SKILLS4ROBOTS
URL(s) https://arxiv.org/pdf/1911.00926.pdf
Reference TypeConference Paper
Author(s)Klink, P.; Peters, J.
Year2019
TitleMeasuring Similarities between Markov Decision Processes
Journal/Conference/Book Title4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Reference TypeJournal Article
Author(s)Kroemer, O.; Leischnig, S.; Luettgen, S.; Peters, J.
Year2018
TitleA Kernel-based Approach to Learning Contact Distributions for Robot Manipulation Tasks
Journal/Conference/Book TitleAutonomous Robots (AURO)
Volume42
Number3
Pages581-600
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Alumni/OliverKroemer/KroemerAuRo17Updated2.pdf
Reference TypeJournal Article
Author(s)Paraschos, A.; Daniel, C.; Peters, J.; Neumann, G.
Year2018
TitleUsing Probabilistic Movement Primitives in Robotics
Journal/Conference/Book TitleAutonomous Robots (AURO)
Volume42
Number3
Pages529-551
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/AlexandrosParaschos/promps_auro.pdf
Reference TypeJournal Article
Author(s)Yi, Z.; Zhang, Y.; Peters, J.
Year2018
TitleBiomimetic Tactile Sensors and Signal Processing with Spike Trains: A Review
Journal/Conference/Book TitleSensors & Actuators: A. Physical
Volume269
Pages41-52
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/SNA2018yi.pdf
Reference TypeJournal Article
Author(s)Paraschos, A.; Rueckert, E.; Peters, J.; Neumann, G.
Year2018
TitleProbabilistic Movement Primitives under Unknown System Dynamics
Journal/Conference/Book TitleAdvanced Robotics (ARJ)
Volume32
Number6
Pages297-310
Link to PDFhttps://www.ias.tu-darmstadt.de/uploads/Alumni/AlexandrosParaschos/Paraschos_AR_2018.pdf
Reference TypeJournal Article
Author(s)Manschitz, S.; Gienger, M.; Kober, J.; Peters, J.
Year2018
TitleMixture of Attractors: A novel Movement Primitive Representation for Learning Motor Skills from Demonstrations
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
Volume3
Number2
Pages926-933
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/SimonManschitz/ManschitzRAL2018.pdf
Reference TypeConference Proceedings
Author(s)Lioutikov, R.; Maeda, G.; Veiga, F.F.; Kersting, K.; Peters, J.
Year2018
TitleInducing Probabilistic Context-Free Grammars for the Sequencing of Robot Movement Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Keywords3rd-Hand
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/RudolfLioutikov/lioutikov_movement_pcfg_icra2018.pdf
Reference TypeConference Proceedings
Author(s)Gebhardt, G.H.W.; Daun, K.; Schnaubelt, M.; Neumann, G.
Year2018
TitleLearning Robust Policies for Object Manipulation with Robot Swarms
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Keywordsswarm robotics, policy search, kernel methods, kilobots
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/GregorGebhardt/LearningRobustPoliciesForObjectManipulationWithRobotSwarms.pdf
Reference TypeJournal Article
Author(s)Vinogradska, J.; Bischoff, B.; Peters, J.
Year2018
TitleApproximate Value Iteration based on Numerical Quadrature
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation, and IEEE Robotics and Automation Letters (RA-L)
Volume3
Number of Volumes2
Pages1330-1337
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NQVI_RAL_manuscript.pdf
Reference TypeConference Proceedings
Author(s)Pinsler, R.; Akrour, R.; Osa, T.; Peters, J.; Neumann, G.
Year2018
TitleSample and Feedback Efficient Hierarchical Reinforcement Learning from Human Preferences
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
KeywordsIAS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/RiadAkrour/icra18_robert.pdf
Reference TypeConference Proceedings
Author(s)Koert, D.; Maeda, G.; Neumann, G.; Peters, J.
Year2018
TitleLearning Coupled Forward-Inverse Models with Combined Prediction Errors
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Keywords3rd-Hand,SKILLS4ROBOTS
AbstractChallenging tasks in unstructured environments require robots to learn complex models. Given a large amount of information, learning multiple simple models can offer an efficient alternative to a monolithic complex network. Training multiple models---that is, learning their parameters and their responsibilities---has been shown to be prohibitively hard as optimization is prone to local minima. To efficiently learn multiple models for different contexts, we thus develop a new algorithm based on expectation maximization (EM). In contrast to comparable concepts, this algorithm trains multiple modules of paired forward-inverse models by using the prediction errors of both forward and inverse models simultaneously. In particular, we show that our method yields a substantial improvement over only considering the errors of the forward models on tasks where the inverse space contains multiple solutions.
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/DorotheaKoert/cfim_final.pdf
Reference TypeJournal Article
Author(s)Osa, T.; Pajarinen, J.; Neumann, G.; Bagnell, J.A.; Abbeel, P.; Peters, J.
Year2018
TitleAn Algorithmic Perspective on Imitation Learning
Journal/Conference/Book TitleFoundations and Trends in Robotics
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/1811.06711
Reference TypeJournal Article
Author(s)Veiga, F.; Peters, J.; Hermans, T.
Year2018
TitleGrip Stabilization of Novel Objects using Slip Prediction
Journal/Conference/Book TitleIEEE Transactions on Haptics
Volume11
Number4
Pages531--542
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/veigaToH2018.pdf
Reference TypeJournal Article
Author(s)Koc, O.; Maeda, G.; Peters, J.
Year2018
TitleOnline optimal trajectory generation for robot table tennis
Journal/Conference/Book TitleRobotics and Autonomous Systems (RAS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Online_optimal_trajectory_generation.pdf
Reference TypeJournal Article
Author(s)Ewerton, M.; Rother, D.; Weimar, J.; Kollegger, G.; Wiemeyer, J.; Peters, J.; Maeda, G.
Year2018
TitleAssisting Movement Training and Execution with Visual and Haptic Feedback
Journal/Conference/Book TitleFrontiers in Neurorobotics
Keywords3rd-Hand, BIMROB, RoMaNS, SKILLS4ROBOTS, NEDO
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/fnbot-12-00024.pdf
Reference TypeConference Proceedings
Author(s)Belousov, B.; Peters, J.
Year2018
TitleEntropic Regularization of Markov Decision Processes
Journal/Conference/Book Title38th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering
Keywordsreinforcement learning; actor-critic methods; entropic proximal mappings; policy search
AbstractThe problem of synthesis of an optimal feedback controller for a given Markov decision process (MDP) can in principle be solved by value iteration or policy iteration. However, if system dynamics and the reward function are unknown, the only way for a learning agent to discover an optimal controller is through interaction with the MDP. During data gathering, it is crucial to account for the lack of information, because otherwise ignorance will push the agent towards dangerous areas of the state space. To prevent such behavior and smoothen learning dynamics, prior works proposed to bound the information loss measured by the Kullback-Leibler (KL) divergence at every policy improvement step. In this paper, we consider a broader family of f -divergences that preserve the beneficial property of the KL divergence of providing the policy improvement step in closed form accompanied by a compatible dual objective for policy evaluation. Such entropic proximal policy optimization view gives a unified perspective on compatible actor-critic architectures. In particular, common least squares value function fitting coupled with advantage-weighted maximum likelihood policy estimation is shown to correspond to the Pearson χ2-divergence penalty. Other connections can be established by considering different choices of the penalty generator function f .
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/maxent18_belousov.pdf
Reference TypeConference Proceedings
Author(s)Parmas, P.; Doya, K.; Rasmussen, C.; Peters, J.
Year2018
TitlePIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
Reference TypeConference Proceedings
Author(s)Arenz, O.; Zhong, M.; Neumann, G.
Year2018
TitleEfficient Gradient-Free Variational Inference using Policy Search
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
KeywordsVariational Inference, Policy Search, Sampling
AbstractInference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources.
Editor(s)Dy, Jennifer and Krause, Andreas
PublisherPMLR
Volume80
Pages234--243
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/OlegArenz/VIPS_full.pdf
Reference TypeJournal Article
Author(s)Buechler, D.; Calandra, R.; Schoelkopf, B.; Peters, J.
Year2018
TitleControl of Musculoskeletal Systems using Learned Dynamics Models
Journal/Conference/Book TitleIEEE Robotics and Automation Letters, and IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttps://ei.is.tuebingen.mpg.de/uploads_file/attachment/attachment/422/RAL18final.pdf
Reference TypeJournal Article
Author(s)Sosic, A.; Rueckert, E.; Peters, J.; Zoubir, A.M.; Koeppl, H
Year2018
TitleInverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Volume19
Number69
Pages1--45
Reference TypeConference Proceedings
Author(s)Akrour, R.; Veiga, F.; Peters, J.; Neumann, G.
Year2018
TitleRegularizing Reinforcement Learning with State Abstraction
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/RiadAkrour/iros18_riad.pdf
Reference TypeConference Proceedings
Author(s)Gondaliya, K.D.; Peters, J.; Rueckert, E.
Year2018
TitleLearning to Categorize Bug Reports with LSTM Networks
Journal/Conference/Book TitleProceedings of the International Conference on Advances in System Testing and Validation Lifecycle
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/VALID2018Gondaliya.pdf
Reference TypeJournal Article
Author(s)Osa, T.; Peters, J.; Neumann, G.
Year2018
TitleHierarchical Reinforcement Learning of Multiple Grasping Strategies with Human Instructions
Journal/Conference/Book TitleAdvanced Robotics
Volume32
Number18
Pages955-968
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/advanced_roboitcs_18osa.pdf
Reference TypeConference Paper
Author(s)Muratore, F.; Treede, F.; Gienger, M.; Peters, J.
Year2018
TitleDomain Randomization for Simulation-Based Policy Optimization with Transferability Assessment
Journal/Conference/Book TitleConference on Robot Learning (CoRL)
Keywordsdomain randomization, simulation optimization, sim-2-real
AbstractExploration-based reinforcement learning on real robot systems is generally time-intensive and can lead to catastrophic robot failures. Therefore, simulation-based policy search appears to be an appealing alternative. Unfortunately, running policy search on a slightly faulty simulator can easily lead to the maximization of the ‘Simulation Optimization Bias’ (SOB), where the policy exploits modeling errors of the simulator such that the resulting behavior can potentially damage the robot. For this reason, much work in robot reinforcement learning has focused on model-free methods that learn on real-world systems. The resulting lack of safe simulation-based policy learning techniques imposes severe limitations on the application of robot reinforcement learning. In this paper, we explore how physics simulations can be utilized for a robust policy optimization by perturbing the simulator’s parameters and training from model ensembles. We propose a new algorithm called Simulation-based Policy Optimization with Transferability Assessment (SPOTA) that uses a biased estimator of the SOB to formulate a stopping criterion for training. We show that the new simulation-based policy search algorithm is able to learn a control policy exclusively from a randomized simulator that can be applied directly to a different system without using any data from the latter.
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_Treede_Gienger_Peters--SPOTA_CoRL2018.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/FabioMuratore/Muratore_Treede_Gienger_Peters--SPOTA_CoRL2018.pdf
LanguageEnglish
Reference TypeConference Proceedings
Author(s)Koert, D.; Trick, S.; Ewerton, M.; Lutter, M.; Peters, J.
Year2018
TitleOnline Learning of an Open-Ended Skill Library for Collaborative Tasks
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsSKILLS4ROBOTS, KOBO
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DorotheaKoert/incremental_promp_2018.pdf
Reference TypeConference Paper
Author(s)Akrour, R.; Peters, J.; Neumann, G.
Year2018
TitleConstraint-Space Projection Direct Policy Search
Journal/Conference/Book TitleEuropean Workshops on Reinforcement Learning (EWRL)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/RiadAkrour/ewrl18_riad.pdf
Reference TypeConference Proceedings
Author(s)Hoelscher, J.; Koert, D.; Peters, J.; Pajarinen, J.
Year2018
TitleUtilizing Human Feedback in POMDP Execution and Specification
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsROMANS,SKILLS4ROBOTS,ROBOLEAP
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/DorotheaKoert/pomdp_user_interaction_2018.pdf
Reference TypeConference Proceedings
Author(s)Belousov, B.; Peters, J.
Year2018
TitleMean Squared Advantage Minimization as a Consequence of Entropic Policy Improvement Regularization
Journal/Conference/Book TitleEuropean Workshops on Reinforcement Learning (EWRL)
Keywordspolicy optimization, entropic proximal mappings, actor-critic algorithms
AbstractPolicy improvement regularization with entropy-like f-divergence penalties provides a unifying perspective on actor-critic algorithms, rendering policy improvement and policy evaluation steps as primal and dual subproblems of the same optimization problem. For small policy improvement steps, we show that all f-divergences with twice differentiable generator function f yield a mean squared advantage minimization objective for the policy evaluation step and an advantage-weighted maximum log-likelihood objective for the policy improvement step. The mean squared advantage objective fits in-between the well-known mean squared Bellman error and the mean squared temporal difference error objectives, requiring only the expectation of the temporal difference error with respect to the next state and not the policy, in contrast to the Bellman error, which requires both, and the temporal difference error, which requires none. The advantage-weighted maximum log-likelihood policy improvement rule emerges as a linear approximation to a more general weighting scheme where weights are a monotone function of the advantage. Thus, the entropic policy regularization framework provides a rigorous justification for the common practice of least squares value function fitting accompanied by advantage-weighted maximum log-likelihood policy parameters estimation, at the same time pointing at the direction in which this classical actor-critic approach can be extended.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/BorisBelousov/ewrl18_belousov.pdf
Reference TypeUnpublished Work
Author(s)Pinsler, R.; Maag, M.; Arenz, O.; Neumann, G.
Year2018
TitleInverse Reinforcement Learning of Bird Flocking Behavior
Journal/Conference/Book TitleSwarms: From Biology to Robotics and Back (ICRA Workshop)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/OlegArenz/PinslerEtAl_ICRA2018swarms.pdf
Reference TypeJournal Article
Author(s)Kupcsik, A.G.; Deisenroth, M.P.; Peters, J.; Ai Poh, L.; Vadakkepat, V.; Neumann, G.
Year2017
TitleModel-based Contextual Policy Search for Data-Efficient Generalization of Robot Skills
Journal/Conference/Book TitleArtificial Intelligence
KeywordsComPLACS
Volume247
Pages415-439
DateJune 2017
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Kupcsik_AIJ_2015.pdf
Reference TypeJournal Article
Author(s)Wang, Z.; Boularias, A.; Muelling, K.; Schoelkopf, B.; Peters, J.
Year2017
TitleAnticipatory Action Selection for Human-Robot Table Tennis
Journal/Conference/Book TitleArtificial Intelligence
Volume247
Pages399-414
DateJune 2017
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Anticipatory_Action_Selection.pdf
Reference TypeJournal Article
Author(s)Maeda, G.; Neumann, G.; Ewerton, M.; Lioutikov, R.; Kroemer, O.; Peters, J.
Year2017
TitleProbabilistic Movement Primitives for Coordination of Multiple Human-Robot Collaborative Tasks
Journal/Conference/Book TitleAutonomous Robots (AURO)
Keywords3rd-Hand, BIMROB
Volume41
Number3
Pages593-612
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/PubGJMaeda/gjm_2016_AURO_c.pdf
Reference TypeJournal Article
Author(s)Parisi, S.; Pirotta, M.; Peters, J.
Year2017
TitleManifold-based Multi-objective Policy Search with Sample Reuse
Journal/Conference/Book TitleNeurocomputing
Keywordsmulti-objective, reinforcement learning, policy search, black-box optimization, importance sampling
Volume263
Pages3-14
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/parisi_neurocomp_morl.pdf
Reference TypeBook Section
Author(s)Peters, J.; Lee, D.; Kober, J.; Nguyen-Tuong, D.; Bagnell, J.; Schaal, S.
Year2017
TitleChapter 15: Robot Learning
Journal/Conference/Book TitleSpringer Handbook of Robotics, 2nd Edition
PublisherSpringer International Publishing
Pages357-394
Reference TypeJournal Article
Author(s)Maeda, G.; Ewerton, M.; Neumann, G.; Lioutikov, R.; Peters, J.
Year2017
TitlePhase Estimation for Fast Action Recognition and Trajectory Generation in Human-Robot Collaboration
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Keywords3rd-Hand, BIMROB
Volume36
Number13-14
Pages1579-1594
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/PubGJMaeda/phase_estim_IJRR.pdf
Reference TypeJournal Article
Author(s)Padois, V.; Ivaldib, S.; Babič, J.; Mistry, M.; Peters, J.; Nori, F.
Year2017
TitleWhole-body multi-contact motion in humans and humanoids
Journal/Conference/Book TitleRobotics and Autonomous Systems
Volume90
Pages97-117
DateApril 2017
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ias_padois_et_al_revised_finalised.pdf
Reference TypeConference Proceedings
Author(s)Tangkaratt, V.; van Hoof, H.; Parisi, S.; Neumann, G.; Peters, J.; Sugiyama, M.
Year2017
TitlePolicy Search with High-Dimensional Context Variables
Journal/Conference/Book TitleProceedings of the AAAI Conference on Artificial Intelligence (AAAI)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/tangkaratt2017policy.pdf
Reference TypeJournal Article
Author(s)Ivaldi, S.; Lefort, S.; Peters, J.; Chetouani, M.; Provasi, J.; Zibetti, E.
Year2017
TitleTowards Engagement Models that Consider Individual Factors in HRI: On the Relation of Extroversion and Negative Attitude Towards Robots to Gaze and Speech During a Human-Robot Assembly Task
Journal/Conference/Book TitleInternational Journal of Social Robotics
Volume9
Number of Volumes1
Pages63-86
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IJSR_edhhi.pdf
Reference TypeConference Proceedings
Author(s)Gebhardt, G.H.W.; Kupcsik, A.G.; Neumann, G.
Year2017
TitleThe Kernel Kalman Rule - Efficient Nonparametric Inference with Recursive Least Squares
Journal/Conference/Book TitleProceedings of the National Conference on Artificial Intelligence (AAAI)
AbstractNonparametric inference techniques provide promising tools for probabilistic reasoning in high-dimensional nonlinear systems. Most of these techniques embed distributions into reproducing kernel Hilbert spaces (RKHS) and rely on the kernel Bayes’ rule (KBR) to manipulate the embeddings. However, the computational demands of the KBR scale poorly with the number of samples and the KBR often suffers from numerical instabilities. In this paper, we present the kernel Kalman rule (KKR) as an alternative to the KBR. The derivation of the KKR is based on recursive least squares, inspired by the derivation of the Kalman innovation update. We apply the KKR to filtering tasks where we use RKHS embeddings to represent the belief state, resulting in the kernel Kalman filter (KKF). We show on a nonlinear state estimation task with high dimensional observations that our approach provides a significantly improved estimation accuracy while the computational demands are significantly decreased.
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/GregorGebhardt/TheKernelKalmanRule.pdf
Reference TypeJournal Article
Author(s)Yi, Z.; Zhang, Y.; Peters, J.
Year2017
TitleBioinspired Tactile Sensor for Surface Roughness Discrimination
Journal/Conference/Book TitleSensors and Actuators A: Physical
Volume255
Pages46-53
Date1 March 2017
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bioinspired_tactile_sensor.pdf
Reference TypeJournal Article
Author(s)Osa, T.; Ghalamzan, E. A. M.; Stolkin, R.; Lioutikov, R.; Peters, J.; Neumann, G.
Year2017
TitleGuiding Trajectory Optimization by Demonstrated Distributions
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
PublisherIEEE
Volume2
Number2
Pages819-826
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Osa_RAL_2017.pdf
Reference TypeJournal Article
Author(s)Kroemer, O.; Peters, J.
Year2017
TitleA Comparison of Autoregressive Hidden Markov Models for Multi-Modal Manipulations with Variable Masses
Journal/Conference/Book TitleProceedings of the International Conference of Robotics and Automation, and IEEE Robotics and Automation Letters (RA-L)
Keywords3rd-Hand, TACMAN
Volume2
Number2
Pages1101 - 1108
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Kroemer_RAL_2017.pdf
Reference TypeConference Proceedings
Author(s)Ewerton, M.; Kollegger, G.; Maeda, G.; Wiemeyer, J.; Peters, J.
Year2017
TitleIterative Feedback-basierte Korrekturstrategien beim Bewegungslernen von Mensch-Roboter-Dyaden
Journal/Conference/Book TitleDVS Sportmotorik 2017
KeywordsBIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/ewerton_motorik2017.pdf
Reference TypeConference Proceedings
Author(s)Kollegger, G.; Reinhardt, N.; Ewerton, M.; Peters, J.; Wiemeyer, J.
Year2017
TitleDie Bedeutung der Beobachtungsperspektive beim Bewegungslernen von Mensch-Roboter-Dyaden
Journal/Conference/Book TitleDVS Sportmotorik 2017
KeywordsBIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/kollegger_motorik2017.pdf
Reference TypeConference Proceedings
Author(s)Wiemeyer, J.; Peters, J.; Kollegger, G.; Ewerton, M.
Year2017
TitleBIMROB – Bidirektionale Interaktion von Mensch und Roboter beim Bewegungslernen
Journal/Conference/Book TitleDVS Sportmotorik 2017
KeywordsBIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/wiemeyer_motorik2017.pdf
Reference TypeConference Proceedings
Author(s)Farraj, F. B.; Osa, T.; Pedemonte, N.; Peters, J.; Neumann, G.; Giordano, P.R.
Year2017
TitleA Learning-based Shared Control Architecture for Interactive Task Execution
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/firas_ICRA17.pdf
Reference TypeConference Proceedings
Author(s)Wilbers, D.; Lioutikov, R.; Peters, J.
Year2017
TitleContext-Driven Movement Primitive Adaptation
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Keywords3rd-Hand
Link to PDF/uploads/Member/PubRudolfLioutikov/wilbers_icra_2017.pdf
Reference TypeConference Proceedings
Author(s)End, F.; Akrour, R.; Peters, J.; Neumann, G.
Year2017
TitleLayered Direct Policy Search for Learning Hierarchical Skills
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Research/Overview/icra17_felix.pdf
Reference TypeConference Proceedings
Author(s)Gabriel, A.; Akrour, R.; Peters, J.; Neumann, G.
Year2017
TitleEmpowered Skills
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
KeywordsIAS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Research/Overview/icra17_alex.pdf
Reference TypeConference Proceedings
Author(s)Abdulsamad, H.; Arenz, O.; Peters, J.; Neumann, G.
Year2017
TitleState-Regularized Policy Search for Linearized Dynamical Systems
Journal/Conference/Book TitleProceedings of the International Conference on Automated Planning and Scheduling (ICAPS)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Abdulsamad_ICAPS_2017.pdf
Reference TypeJournal Article
Author(s)Lioutikov, R.; Neumann, G.; Maeda, G.; Peters, J.
Year2017
TitleLearning Movement Primitive Libraries through Probabilistic Segmentation
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Keywords3rd-hand
Volume36
Number8
Pages879-894
Link to PDF/uploads/Publications/lioutikov_probs_ijrr2017.pdf
Reference TypeConference Proceedings
Author(s)Fiebig, K.H.; Jayaram, V.; Hesse, T.; Blank, A.; Peters, J.; Grosse-Wentrup, M.
Year2017
TitleBayesian Regression for Artifact Correction in Electroencephalography
Journal/Conference/Book TitleProceedings of the 7th Graz Brain-Computer Interface Conference
Reference TypeConference Proceedings
Author(s)Akrour, R.; Sorokin, D.; Peters, J.; Neumann, G.
Year2017
TitleLocal Bayesian Optimization of Motor Skills
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
Link to PDFhttp://proceedings.mlr.press/v70/akrour17a/akrour17a.pdf
Reference TypeConference Proceedings
Author(s)Gebhardt, G.H.W.; Daun, K.; Schnaubelt, M.; Hendrich, A.; Kauth, D.; Neumann, G.
Year2017
TitleLearning to Assemble Objects with a Robot Swarm
Journal/Conference/Book TitleProceedings of the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS)
Keywordsmulti-agent learning, reinforcement learning, swarm robotics
AbstractNature provides us with a multitude of examples that show how swarms of simple agents are much richer in their abilities than a single individual. This insight is a main principle that swarm robotics tries to exploit. In the last years, large swarms of low-cost robots such as the Kilobots have become available. This allows to bring algorithms developed for swarm robotics from simulations to the real world. Recently, the Kilobots have been used for an assembly task with multiple objects: a human operator controlled a light source to guide the swarm of light-sensitive robots such that they successfully assembled an object of multiple parts. However, hand-coding the control of the light source for autonomous assembly is not straight forward as the interactions of the swarm with the object or the reaction to the light source are hard to model.
PublisherInternational Foundation for Autonomous Agents and Multiagent Systems
Pages1547--1549
URL(s) http://dl.acm.org/citation.cfm?id=3091282.3091357
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/GregorGebhardt/LearningToAssembleObjectsWithARobotSwarm.pdf
Reference TypeConference Proceedings
Author(s)Kollegger, G.; Ewerton, M.; Wiemeyer, J.; Peters, J.
Year2017
TitleBIMROB – Bidirectional Interaction between human and robot for the learning of movements – Robot trains human – Human trains robot
Journal/Conference/Book Title23. Sport­wissenschaft­licher Hochschultag der dvs
KeywordsBIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/kollegger_dvs_hochschultag_2017.pdf
Reference TypeConference Proceedings
Author(s)Tosatto, S.; Pirotta, M.; D'Eramo, C; Restelli, M.
Year2017
TitleBoosted Fitted Q-Iteration
Journal/Conference/Book TitleProceedings of the International Conference of Machine Learning (ICML)
KeywordsBosch-Forschungstiftung
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/tosatto_icml2017.pdf
Reference TypeConference Paper
Author(s)Belousov, B.; Neumann, G.; Rothkopf, C.A.; Peters, J.
Year2017
TitleCatching Heuristics Are Optimal Control Policies
Journal/Conference/Book TitleProceedings of the Karniel Thirteenth Computational Motor Control Workshop
KeywordsSKILLS4ROBOTS
AbstractTwo seemingly contradictory theories attempt to explain how humans move to intercept an airborne ball. One theory posits that humans predict the ball trajectory to optimally plan future actions; the other claims that, instead of performing such complicated computations, humans employ heuristics to reactively choose appropriate actions based on immediate visual feedback. In this paper, we show that interception strategies appearing to be heuristics can be understood as computational solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances. The catcher’s policy switches between generating reactive and predictive behavior based on the ratio of system to observation noise and the ratio between reaction time and task duration. Thus, we provide a rational account of human ball-catching behavior and a unifying explanation for seemingly contradictory theories of target interception on the basis of stochastic optimal control.
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Belousov_ANIPS_2016.pdf
Reference TypeConference Proceedings
Author(s)Busch, B.; Maeda, G.; Mollard, Y.; Demangeat, M.; Lopes, M.
Year2017
TitlePostural Optimization for an Ergonomic Human-Robot Interaction
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Keywords3rd-Hand
Reference TypeConference Proceedings
Author(s)Pajarinen, J.; Kyrki, V.; Koval, M.; Srinivasa, S; Peters, J.; Neumann, G.
Year2017
TitleHybrid Control Trajectory Optimization under Uncertainty
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsRoMaNS
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/pajarinen_iros_2017.pdf
Reference TypeConference Proceedings
Author(s)Parisi, S.; Ramstedt, S.; Peters, J.
Year2017
TitleGoal-Driven Dimensionality Reduction for Reinforcement Learning
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/parisi2017iros.pdf
Reference TypeJournal Article
Author(s)Paraschos, A.; Lioutikov, R.; Peters, J.; Neumann, G.
Year2017
TitleProbabilistic Prioritization of Movement Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Intelligent Robot Systems, and IEEE Robotics and Automation Letters (RA-L)
Keywordscodyco
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/AlexandrosParaschos/paraschos_prob_prio.pdf
Reference TypeJournal Article
Author(s)van Hoof, H.; Tanneberg, D.; Peters, J.
Year2017
TitleGeneralized Exploration in Policy Search
Journal/Conference/Book TitleMachine Learning (MLJ)
Volume106
Number9-10
Pages1705-1724
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/vanHoof_MLJ_2017.pdf
Reference TypeJournal Article
Author(s)van Hoof, H.; Neumann, G.; Peters, J.
Year2017
TitleNon-parametric Policy Search with Limited Information Loss
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
KeywordsTACMAN, reinforcement learning
Volume18
Number73
Pages1-46
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Alumni/HerkeVanHoof/vanHoof_JMLR_2017.pdf
Reference TypeJournal Article
Author(s)Vinogradska, J.; Bischoff, B.; Nguyen-Tuong, D.; Peters, J.
Year2017
TitleStability of Controllers for Gaussian Process Forward Models
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Volume18
Number100
Pages1-37
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/16-590.pdf
Reference TypeJournal Article
Author(s)Dermy, O.; Paraschos, A.; Ewerton, M.; Charpillet, F.; Peters, J.; Ivaldi, S
Year2017
TitlePrediction of intention during interaction with iCub with Probabilistic Movement Primitives
Journal/Conference/Book TitleFrontiers in Robotics and AI
KeywordsCoDyCo, BIMROB
Volume4
Pages45
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/frobt-04-00045.pdf
Reference TypeGeneric
Author(s)Ewerton, M.; Maeda, G.; Rother, D.; Weimar, J.; Lotter, L.; Kollegger, G.; Wiemeyer, J.; Peters, J.
Year2017
TitleAssisting the practice of motor skills by humans with a probability distribution over trajectories
Journal/Conference/Book TitleWorkshop Human-in-the-loop robotic manipulation: on the influence of the human role at IROS 2017, Vancouver, Canada
Keywords3rd-Hand, BIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Member/PubMarcoEwerton/WORKSHOP_IROS_2017.pdf
Reference TypeConference Proceedings
Author(s)Tanneberg, D.; Peters, J.; Rueckert, E.
Year2017
TitleOnline Learning with Stochastic Recurrent Neural Networks using Intrinsic Motivation Signals
Journal/Conference/Book TitleProceedings of the Conference on Robot Learning (CoRL)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/DanielTanneberg/corl17_01.pdf
Reference TypeConference Proceedings
Author(s)Maeda, G.; Ewerton, M.; Osa, T.; Busch, B.; Peters, J.
Year2017
TitleActive Incremental Learning of Robot Movement Primitives
Journal/Conference/Book TitleProceedings of the Conference on Robot Learning (CoRL)
Keywords3rd-Hand, BIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/PubGJMaeda/maedaCoRL_20171014.pdf
Reference TypeConference Proceedings
Author(s)Rueckert, E.; Nakatenus, M.; Tosatto, S.; Peters, J.
Year2017
TitleLearning Inverse Dynamics Models in O(n) time with LSTM networks
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsGOAL-Robots, SKILLS4ROBOTS, Bosch-Forschungstiftung
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Humanoids2017Rueckert.pdf
Reference TypeConference Proceedings
Author(s)Tanneberg, D.; Peters, J.; Rueckert, E.
Year2017
TitleEfficient Online Adaptation with Stochastic Recurrent Neural Networks
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/DanielTanneberg/humanoids17_01.pdf
Reference TypeConference Proceedings
Author(s)Stark, S.; Peters, J.; Rueckert, E.
Year2017
TitleA Comparison of Distance Measures for Learning Nonparametric Motor Skill Libraries
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/SvenjaStark/stark_humanoids2017.pdf
Reference TypeJournal Article
Author(s)Kollegger, G.; Ewerton, M.; Wiemeyer, J.; Peters, J.
Year2017
TitleBIMROB -- Bidirectional Interaction Between Human and Robot for the Learning of Movements
Journal/Conference/Book TitleProceedings of the 11th International Symposium on Computer Science in Sport (IACSS 2017)
KeywordsBIMROB
Editor(s)Lames, M.; Saupe, D.; Wiemeyer, J.
PublisherSpringer International Publishing
Pages151--163
ISBN/ISSN978-3-319-67846-7
URL(s) https://doi.org/10.1007/978-3-319-67846-7_15
Reference TypeConference Proceedings
Author(s)Thiem, S.; Stark, S.; Tanneberg, D.; Peters, J.; Rueckert, E.
Year2017
TitleSimulation of the underactuated Sake Robotics Gripper in V-REP
Journal/Conference/Book TitleWorkshop at the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsGOAL-Robots, SKILLS4ROBOTS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/PubElmarRueckert/Humanoids2017Thiem.pdf
Reference TypeConference Proceedings
Author(s)Grossberger, L.; Hohmann, M.R.; Peters J.; Grosse-Wentrup, M.
Year2017
TitleInvestigating Music Imagery as a Cognitive Paradigm for Low-Cost Brain-Computer Interfaces
Journal/Conference/Book TitleProceedings of the 7th Graz Brain-Computer Interface Conference
Reference TypeConference Proceedings
Author(s)Kollegger, G., Wiemeyer, J., Ewerton, M. & Peters, J.
Year2017
TitleBIMROB - Bidirectional Interaction between human and robot for the learning of movements - Robot trains human - Human trains robot
Journal/Conference/Book TitleInovation & Technologie im Sport - 23. Sportwissenschaftlicher Hochschultag der deutschen Vereinigung für Sportwissenschaft
KeywordsBIMROB
Editor(s)A. Schwirtz, F. Mess, Y. Demetriou & V. Senner
Place PublishedHamburg
PublisherCzwalina-Feldhaus
Pages179
Reference TypeReport
Author(s)Belousov, B.; Peters, J.
Year2017
Titlef-Divergence Constrained Policy Improvement
Journal/Conference/Book TitlearXiv
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/1801.00056.pdf
Reference TypeConference Proceedings
Author(s)Osa, T.; Peters, J.; Neumann, G.
Year2016
TitleExperiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies
Journal/Conference/Book TitleProceedings of the International Symposium on Experimental Robotics (ISER)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/osa_ISER2016.pdf
Reference TypeConference Proceedings
Author(s)Arenz, O.; Abdulsamad, H.; Neumann, G.
Year2016
TitleOptimal Control and Inverse Optimal Control by Distribution Matching
Journal/Conference/Book TitleProceedings of the International Conference on Intelligent Robots and Systems (IROS)
KeywordsImitation Learning, Inverse Optimal Control, Optimal Control
AbstractOptimal control is a powerful approach to achieve optimal behavior. However, it typically requires a manual specification of a cost function which often contains several objectives, such as reaching goal positions at different time steps or energy efficiency. Manually trading-off these objectives is often difficult and requires a high engineering effort. In this paper, we present a new approach to specify optimal behavior. We directly specify the desired behavior by a distribution over future states or features of the states. For example, the experimenter could choose to reach certain mean positions with given accuracy/variance at specified time steps. Our approach also unifies optimal control and inverse optimal control in one framework. Given a desired state distribution, we estimate a cost function such that the optimal controller matches the desired distribution. If the desired distribution is estimated from expert demonstrations, our approach performs inverse optimal control. We evaluate our approach on several optimal and inverse optimal control tasks on non-linear systems using incremental linearizations similar to differential dynamic programming approaches.
PublisherIEEE
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Team/OlegArenz/OC and IOC By Matching Distributions_withSupplements.pdf
Reference TypeJournal Article
Author(s)Rueckert, E.; Kappel, D.; Tanneberg, D.; Pecevski, D; Peters, J.
Year2016
TitleRecurrent Spiking Networks Solve Planning Tasks
Journal/Conference/Book TitleNature PG: Scientific Reports
Keywords3rdHand, CoDyCo
PublisherNature Publishing Group
Volume6
Number21142
Date2016/02/18/online
ISBN/ISSN10.1038/srep21142
Custom 2http://www.nature.com/articles/srep21142#supplementary-information
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/srep21142
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/srep21142
Reference TypeConference Proceedings
Author(s)Kohlschuetter, J.; Peters, J.; Rueckert, E.
Year2016
TitleLearning Probabilistic Features from EMG Data for Predicting Knee Abnormalities
Journal/Conference/Book TitleProceedings of the XIV Mediterranean Conference on Medical and Biological Engineering and Computing (MEDICON)
KeywordsCoDyCo, TACMAN
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/KohlschuetterMEDICON_2016.pdf
Reference TypeJournal Article
Author(s)Maeda, G.; Ewerton, M.; Koert, D; Peters, J.
Year2016
TitleAcquiring and Generalizing the Embodiment Mapping from Human Observations to Robot Skills
Journal/Conference/Book TitleIEEE Robotics and Automation Letters (RA-L)
Keywords3rd-Hand, BIMROB
Volume1
Number2
Pages784--791
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/GuilhermeMaeda/maeda_RAL_golf_2016.pdf
Reference TypeConference Proceedings
Author(s)Modugno, V.; Neumann, G.; Rueckert, E.; Oriolo, G.; Peters, J.; Ivaldi, S.
Year2016
TitleLearning soft task priorities for control of redundant robots
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
KeywordsCoDyCo
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/main_revised.pdf
Reference TypeConference Proceedings
Author(s)Buechler, D.; Ott, H.; Peters, J.
Year2016
TitleA Lightweight Robotic Arm with Pneumatic Muscles for Robot Learning
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Reference TypeConference Proceedings
Author(s)Ewerton, M.; Maeda, G.; Neumann, G.; Kisner, V.; Kollegger, G.; Wiemeyer, J.; Peters, J.
Year2016
TitleMovement Primitives with Multiple Phase Parameters
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
KeywordsBIMROB, 3rd-Hand
Pages201--206
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/ewerton_icra_2016_stockholm.pdf
Reference TypeJournal Article
Author(s)Daniel, C.; Neumann, G.; Kroemer, O.; Peters, J.
Year2016
TitleHierarchical Relative Entropy Policy Search
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Volume17
Pages1-50
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Daniel2016JMLR.pdf
Reference TypeReport
Author(s)Veiga, F.F.; Peters, J.
Year2016
TitleCan Modular Finger Control for In-Hand Object Stabilization be accomplished by Independent Tactile Feedback Control Laws?
Journal/Conference/Book TitlearXiv
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/1612.08202.pdf
Reference TypeJournal Article
Author(s)Abdolmaleki, A.; Lau, N.; Reis, L.; Peters, J.; Neumann, G.
Year2016
TitleContextual Policy Search for Linear and Nonlinear Generalization of a Humanoid Walking Controller
Journal/Conference/Book TitleJournal of Intelligent & Robotic Systems
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/contextualWalking.pdf
Reference TypeConference Proceedings
Author(s)Vinogradska, J.; Bischoff, B.; Nguyen-Tuong, D.; Romer, A.; Schmidt, H.; Peters, J.
Year2016
TitleStability of Controllers for Gaussian Process Forward Models
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Publications/Vinogradska_ICML_2016.pdf
Reference TypeConference Proceedings
Author(s)Akrour, R.; Abdolmaleki, A.; Abdulsamad, H.; Neumann, G.
Year2016
TitleModel-Free Trajectory Optimization for Reinforcement Learning
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/akrour16.pdf
Reference TypeConference Proceedings
Author(s)Sharma, D.; Tanneberg, D.; Grosse-Wentrup, M.; Peters, J.; Rueckert, E.
Year2016
TitleAdaptive Training Strategies for BCIs
Journal/Conference/Book TitleCybathlon Symposium
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/ElmarR%c3%bcckert/Cybathlon16_AdaptiveTrainingRL.pdf
Reference TypeConference Proceedings
Author(s)Calandra, R.; Peters, J.; Rasmussen, C.E.; Deisenroth, M.P.
Year2016
TitleManifold Gaussian Processes for Regression
Journal/Conference/Book TitleProceedings of the International Joint Conference on Neural Networks (IJCNN)
KeywordsCoDyCo
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/1402.5876v4
Reference TypeConference Proceedings
Author(s)Weber, P.; Rueckert, E.; Calandra, R.; Peters, J.; Beckerle, P.
Year2016
TitleA Low-cost Sensor Glove with Vibrotactile Feedback and Multiple Finger Joint and Hand Motion Sensing for Human-Robot Interaction
Journal/Conference/Book TitleProceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)
KeywordsCoDyCo
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/ElmarR%c3%bcckert/ROMANS16_daglove.pdf
Reference TypeJournal Article
Author(s)Rueckert, E.; Camernik, J.; Peters, J.; Babic, J.
Year2016
TitleProbabilistic Movement Models Show that Postural Control Precedes and Predicts Volitional Motor Control
Journal/Conference/Book TitleNature PG: Scientific Reports
KeywordsCoDyCo; TACMAN
Volume6
Number28455
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/srep28455
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/srep28455
Reference TypeJournal Article
Author(s)Daniel, C.; van Hoof, H.; Peters, J.; Neumann, G.
Year2016
TitleProbabilistic Inference for Determining Options in Reinforcement Learning
Journal/Conference/Book TitleMachine Learning (MLJ)
Volume104
Number2-3
Pages337-357
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Daniel2016ECML.pdf
Reference TypeConference Proceedings
Author(s)Manschitz, S.; Gienger, M.; Kober, J.; Peters, J.
Year2016
TitleProbabilistic Decomposition of Sequential Force Interaction Tasks into Movement Primitives
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsHonda, HRI-Collaboration
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/SimonManschitz/ManschitzIROS2016.pdf
Reference TypeConference Proceedings
Author(s)van Hoof, H.; Chen, N.; Karl, M.; van der Smagt, P.; Peters, J.
Year2016
TitleStable Reinforcement Learning with Autoencoders for Tactile and Visual Data
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsTACMAN, tactile manipulation
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/hoof2016IROS.pdf
Reference TypeConference Proceedings
Author(s)Yi, Z.; Calandra, R.; Veiga, F.; van Hoof, H.; Hermans, T.; Zhang, Y.; Peters, J.
Year2016
TitleActive Tactile Object Exploration with Gaussian Processes
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsTACMAN, tactile manipulation
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Publications/Other/iros2016yi.pdf
Reference TypeConference Proceedings
Author(s)Koc, O.; Peters, J.; Maeda, G.
Year2016
TitleA New Trajectory Generation Framework in Robotic Table Tennis
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
Reference TypeConference Proceedings
Author(s)Belousov, B.; Neumann, G.; Rothkopf, C.; Peters, J.
Year2016
TitleCatching Heuristics Are Optimal Control Policies
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
KeywordsSKILLS4ROBOTS
AbstractTwo seemingly contradictory theories attempt to explain how humans move to intercept an airborne ball. One theory posits that humans predict the ball trajectory to optimally plan future actions; the other claims that, instead of performing such complicated computations, humans employ heuristics to reactively choose appropriate actions based on immediate visual feedback. In this paper, we show that interception strategies appearing to be heuristics can be understood as computa- tional solutions to the optimal control problem faced by a ball-catching agent acting under uncertainty. Modeling catching as a continuous partially observable Markov decision process and employing stochastic optimal control theory, we discover that the four main heuristics described in the literature are optimal solutions if the catcher has sufficient time to continuously visually track the ball. Specifically, by varying model parameters such as noise, time to ground contact, and perceptual latency, we show that different strategies arise under different circumstances. The catcher’s policy switches between generating reactive and predictive behavior based on the ratio of system to observation noise and the ratio between reaction time and task duration. Thus, we provide a rational account of human ball-catching behavior and a unifying explanation for seemingly contradictory theories of target interception on the basis of stochastic optimal control.
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Belousov_ANIPS_2016.pdf
Reference TypeGeneric
Author(s)Maeda, G.; Maloo, A.; Ewerton, M.; Lioutikov, R.; Peters, J.
Year2016
TitleProactive Human-Robot Collaboration with Interaction Primitives
Journal/Conference/Book TitleInternational Workshop on Human-Friendly Robotics (HFR), Genoa, Italy
Keywords3rd-Hand, BIMROB
Reference TypeConference Proceedings
Author(s)Maeda, G.; Maloo, A.; Ewerton, M.; Lioutikov, R.; Peters, J.
Year2016
TitleAnticipative Interaction Primitives for Human-Robot Collaboration
Journal/Conference/Book TitleAAAI Fall Symposium Series. Shared Autonomy in Research and Practice, Arlington, VA, USA
Keywords3rd-Hand, BIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/PubGJMaeda/maeda-maloo_AAAI_symposium.pdf
Reference TypeBook Section
Author(s)Peters, J.; Tedrake, R.; Roy, N.; Morimoto, J.
Year2016
TitleRobot Learning
Journal/Conference/Book TitleEncyclopedia of Machine Learning, 2nd Edition, Invited Article
Reference TypeConference Proceedings
Author(s)Tanneberg, D.; Paraschos, A.; Peters, J.; Rueckert, E.
Year2016
TitleDeep Spiking Networks for Model-based Planning in Humanoids
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo; TACMAN
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/DanielTanneberg/tanneberg_humanoids16.pdf
Reference TypeConference Proceedings
Author(s)Huang, Y.; Buechler, D.; Koc, O.; Schoelkopf, B.; Peters, J.
Year2016
TitleJointly Learning Trajectory Generation and Hitting Point Prediction in Robot Table Tennis
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Reference TypeConference Proceedings
Author(s)Koert, D.; Maeda, G.J.; Lioutikov, R.; Neumann, G.; Peters, J.
Year2016
TitleDemonstration Based Trajectory Optimization for Generalizable Robot Motions
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Keywords3rd-Hand,SKILLS4ROBOTS
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/DorotheaKoert/Debato.pdf
Reference TypeConference Proceedings
Author(s)Gomez-Gonzalez, S.; Neumann, G.; Schoelkopf, B.; Peters, J.
Year2016
TitleUsing Probabilistic Movement Primitives for Striking Movements
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Reference TypeConference Proceedings
Author(s)Ewerton, M.; Maeda, G.J.; Kollegger, G.; Wiemeyer, J.; Peters, J.
Year2016
TitleIncremental Imitation Learning of Context-Dependent Motor Skills
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsBIMROB, 3rd-Hand
Pages351--358
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/ewerton_humanoids_2016.pdf
Reference TypeConference Proceedings
Author(s)Azad, M.; Ortenzi, V.; Lin, H., C.; Rueckert, E.; Mistry, M.
Year2016
TitleModel Estimation and Control of Complaint Contact Normal Force
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Humanoids2016Azad.pdf
Reference TypeConference Proceedings
Author(s)Kollegger, G.; Ewerton, M.; Peters, J.; Wiemeyer, J.
Year2016
TitleBidirektionale Interaktion zwischen Mensch und Roboter beim Bewegungslernen (BIMROB)
Journal/Conference/Book Title11. Symposium der DVS Sportinformatik
KeywordsBIMROB
Link to PDFhttp://www.sportinformatik2016.ovgu.de/Tagung/Abstracts.html
Reference TypeConference Proceedings
Author(s)Parisi, S; Blank, A; Viernickel T; Peters, J
Year2016
TitleLocal-utopia Policy Selection for Multi-objective Reinforcement Learning
Journal/Conference/Book TitleProceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/parisi2016local.pdf
Reference TypeBook Section
Author(s)Peters, J.; Bagnell, J.A.
Year2016
TitlePolicy gradient methods
Journal/Conference/Book TitleEncyclopedia of Machine Learning, 2nd Edition, Invited Article
Reference TypeConference Proceedings
Author(s)Fiebig, K.-H.; Jayaram, V.; Peters, J.; Grosse-Wentrup, M.
Year2016
TitleMulti-Task Logistic Regression in Brain-Computer Interfaces
Journal/Conference/Book TitleIEEE SMC 2016 — 6th Workshop on Brain-Machine Interface Systems
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/smc_2016_FiJaPeGW_mtl_logreg_v2.pdf
Reference TypeJournal Article
Author(s)Yi, Z.; Zhang, Y.; Peters, J.
Year2016
TitleSurface Roughness Discrimination Using Bioinspired Tactile Sensors
Journal/Conference/Book TitleProceedings of the 16th International Conference on Biomedical Engineering
Reference TypeUnpublished Work
Author(s)Arenz, O.; Neumann, G.
Year2016
TitleIterative Cost Learning from Different Types of Human Feedback
Journal/Conference/Book TitleIROS 2016 Workshop on Human-Robot Collaboration
KeywordsInverse Reinforcement Learning, Preference Learning
AbstractHuman-robot collaboration in unstructured envi- ronments often involves different types of interactions. These interactions usually occur frequently during normal operation and may provide valuable information about the task to the robot. It is therefore sensible to utilize this data for lifelong robot learning. Learning from human interactions is an active field of research, e.g., Inverse Reinforcement Learning, which aims at learning from demonstrations, or Preference Learning, which aims at learning from human preferences. However, learning from a combination of different types of feedback is still little explored. In this paper, we propose a method for inferring a reward function from a combination of expert demonstrations, pairwise preferences, star ratings as well as oracle-based evaluations of the true reward function. Our method extends Maximum Entropy Inverse Reinforcement Learning in order to account for the additional types of human feedback by framing them as constraints to the original optimization problem. We demonstrate on a gridworld, that the resulting optimization problem can be solved based on the Alternating Direction Method of Multipliers (ADMM), even when confronted with a large amount of training data.
URL(s) http://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/arenz_workshop_IROS16.pdf
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/arenz_workshop_IROS16.pdf
Reference TypeUnpublished Work
Author(s)Arenz, O.; Abdulsamad, H.; Neumann, G.
Year2016
Title(Inverse) Optimal Control for Matching Higher-Order Moments
Journal/Conference/Book TitleDGR Days (Leipzig)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/OlegArenz/oleg_dgr_2016.pdf
Reference TypeJournal Article
Author(s)Calandra, R.; Seyfarth, A.; Peters, J.; Deisenroth, M.
Year2015
TitleBayesian Optimization for Learning Gaits under Uncertainty
Journal/Conference/Book TitleAnnals of Mathematics and Artificial Intelligence (AMAI)
KeywordsCoDyCo
AbstractDesigning gaits and corresponding control policies is a key challenge in robot locomotion. Even with a viable controller parameterization, finding nearoptimal parameters can be daunting. Typically, this kind of parameter optimization requires specific expert knowledge and extensive robot experiments. Automatic black-box gait optimization methods greatly reduce the need for human expertise and time-consuming design processes. Many different approaches for automatic gait optimization have been suggested to date, such as grid search and evolutionary algorithms. In this article, we thoroughly discuss multiple of these optimization methods in the context of automatic gait optimization. Moreover, we extensively evaluate Bayesian optimization, a model-based approach to black-box optimization under uncertainty, on both simulated problems and real robots. This evaluation demonstrates that Bayesian optimization is particularly suited for robotic applications, where it is crucial to find a good set of gait parameters in a small number of experiments.
URL(s) http://dx.doi.org/10.1007/s10472-015-9463-9
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Calandra2015a.pdf
Reference TypeJournal Article
Author(s)Mariti, C.; Muscolo, G.G.; Peters, J.; Puig, D.; Recchiuto, C.T.; Sighieri, C.; Solanas, A.; von Stryk, O.
Year2015
TitleDeveloping biorobotics for veterinary research into cat movements
Journal/Conference/Book TitleJournal of Veterinary Behavior: Clinical Applications and Research
Volume10
Number of Volumes3
Pages248-254
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Developing_biorobotics_for_veterinary_research.pdf
Reference TypeConference Proceedings
Author(s)van Hoof, H.; Peters, J.; Neumann, G.
Year2015
TitleLearning of Non-Parametric Control Policies with High-Dimensional State Features
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)
KeywordsTACMAN
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/hoof2015learning.pdf
Reference TypeConference Proceedings
Author(s)Calandra, R.; Ivaldi, S.; Deisenroth, M.;Rueckert, E.; Peters, J.
Year2015
TitleLearning Inverse Dynamics Models with Contacts
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
KeywordsCoDyCo
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Calandra_ICRA15.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Daniel, C.; Neumann, G; van Hoof, H.; Peters, J.
Year2015
TitleTowards Learning Hierarchical Skills for Multi-Phase Manipulation Tasks
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Keywords3rd-hand, 3rdHand
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/KroemerICRA15.pdf
Reference TypeConference Proceedings
Author(s)Rueckert, E.; Mundo, J.; Paraschos, A.; Peters, J.; Neumann, G.
Year2015
TitleExtracting Low-Dimensional Control Variables for Movement Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Keywords3rdHand, CoDyCo
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Rueckert_ICRA14LMProMPsFinal.pdf
Reference TypeConference Proceedings
Author(s)Ewerton, M.; Neumann, G.; Lioutikov, R.; Ben Amor, H.; Peters, J.; Maeda, G.
Year2015
TitleLearning Multiple Collaborative Tasks with a Mixture of Interaction Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Keywords3rd-Hand, CompLACS, BIMROB
Pages1535--1542
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/ewerton_icra_2015_seattle.pdf
Reference TypeConference Proceedings
Author(s)Traversaro, S.; Del Prete, A.; Ivaldi, S.; Nori, F.
Year2015
TitleAvoiding to rely on Inertial Parameters in Estimating Joint Torques with proximal F/T sensing
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
KeywordsCoDyCo
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICRA15_2129_FI.pdf
Reference TypeConference Proceedings
Author(s)Lopes, M.; Peters, J.; Piater, J.; Toussaint, M.; Baisero, A.; Busch, B.; Erkent, O.; Kroemer, O.; Lioutikov, R.; Maeda, G.; Mollard, Y.; Munzer, T.; Shukla, D.
Year2015
TitleSemi-Autonomous 3rd-Hand Robot
Journal/Conference/Book TitleWorkshop on Cognitive Robotics in Future Manufacturing Scenarios, European Robotics Forum, Vienna, Austria
Keywords3rdhand
Link to PDFhttps://iis.uibk.ac.at/public/papers/Lopes-2015-CogRobFoF.pdf
Reference TypeConference Proceedings
Author(s)Lioutikov, R.; Neumann, G.; Maeda, G.J.; Peters, J.
Year2015
TitleProbabilistic Segmentation Applied to an Assembly Task
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Keywords3rd-hand
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/lioutikov_humanoids_2015.pdf
Reference TypeConference Proceedings
Author(s)Paraschos, A.; Rueckert, E.; Peters, J; Neumann, G.
Year2015
TitleModel-Free Probabilistic Movement Primitives for Physical Interaction
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsCoDyCo
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/PubAlexParaschos/Paraschos_IROS_2015.pdf
Reference TypeConference Paper
Author(s)Rueckert, E.; Lioutikov, R.; Calandra, R.; Schmidt, M.; Beckerle, P.; Peters, J.
Year2015
TitleLow-cost Sensor Glove with Force Feedback for Learning from Demonstrations using Probabilistic Trajectory Representations
Journal/Conference/Book TitleICRA 2015 Workshop on Tactile and force sensing for autonomous compliant intelligent robots
KeywordsCoDyCo
URL(s) http://arxiv.org/abs/1510.03253
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Workshops/ICRA2015TactileForce/13_icra_ws_tactileforce.pdf
Reference TypeGeneric
Author(s)Ewerton, M.; Neumann, G.; Lioutikov, R.; Ben Amor, H.; Peters, J.; Maeda, G.
Year2015
TitleModeling Spatio-Temporal Variability in Human-Robot Interaction with Probabilistic Movement Primitives
Journal/Conference/Book TitleWorkshop on Machine Learning for Social Robotics, ICRA
Keywords3rd-Hand, CompLACS, BIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/ewerton_workshop_ml_social_robotics_icra_2015.pdf
Reference TypeConference Proceedings
Author(s)Parisi, S.; Abdulsamad, H.; Paraschos, A.; Daniel, C.; Peters, J.
Year2015
TitleReinforcement Learning vs Human Programming in Tetherball Robot Games
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsSCARL
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Member/PubSimoneParisi/parisi_iros_2015.pdf
Reference TypeConference Proceedings
Author(s)Veiga, F.F.; van Hoof, H.; Peters, J.; Hermans, T.
Year2015
TitleStabilizing Novel Objects by Learning to Predict Tactile Slip
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsTACMAN
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/IROS2015veiga.pdf
Reference TypeConference Proceedings
Author(s)Huang, Y.; Schoelkopf, B.; Peters, J.
Year2015
TitleLearning Optimal Striking Points for A Ping-Pong Playing Robot
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/YanlongHuang/Yanlong_IROS2015
Reference TypeConference Proceedings
Author(s)Manschitz, S.; Kober, J.; Gienger, M.; Peters, J.
Year2015
TitleProbabilistic Progress Prediction and Sequencing of Concurrent Movement Primitives
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsHonda
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/ManschitzIROS2015_v2.pdf
Reference TypeConference Proceedings
Author(s)Ewerton, M.; Maeda, G.J.; Peters, J.; Neumann, G.
Year2015
TitleLearning Motor Skills from Partially Observed Movements Executed at Different Speeds
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsBIMROB, 3rd-hand
Pages456--463
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/MarcoEwerton/ewerton_iros_2015_hamburg.pdf
Reference TypeConference Proceedings
Author(s)Wahrburg, A.; Zeiss, S.; Matthias, B.; Peters, J.; Ding, H.
Year2015
TitleCombined Pose-Wrench and State Machine Representation for Modeling Robotic Assembly Skills
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsABB
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Wahrburg_IROS_2015.pdf
Reference TypeJournal Article
Author(s)Daniel, C.; Kroemer, O.; Viering, M.; Metz, J.; Peters, J.
Year2015
TitleActive Reward Learning with a Novel Acquisition Function
Journal/Conference/Book TitleAutonomous Robots (AURO)
KeywordsComPLACS
Volume39
Number of Volumes3
Pages389-405
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/ChristianDaniel/ActiveRewardLearning.pdf
Reference TypeConference Proceedings
Author(s)Fritsche, L.; Unverzagt, F.; Peters, J.; Calandra, R.
Year2015
TitleFirst-Person Tele-Operation of a Humanoid Robot
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Fritsche_Humanoids15.pdf
Reference TypeConference Proceedings
Author(s)Calandra, R.; Ivaldi, S.; Deisenroth, M.; Peters, J.
Year2015
TitleLearning Torque Control in Presence of Contacts using Tactile Sensing from Robot Skin
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Calandra_humanoids2015.pdf
Reference TypeJournal Article
Author(s)Manschitz, S.; Kober, J.; Gienger, M.; Peters, J.
Year2015
TitleLearning Movement Primitive Attractor Goals and Sequential Skills from Kinesthetic Demonstrations
Journal/Conference/Book TitleRobotics and Autonomous Systems
KeywordsHonda, HRI-Collaboration
Volume74
Pages97-107
ISBN/ISSN0921-8890
URL(s) http://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/ManschitzRAS2015_v2.pdf
Reference TypeConference Proceedings
Author(s)Maeda, G.; Neumann, G.; Ewerton, M.; Lioutikov, R.; Peters, J.
Year2015
TitleA Probabilistic Framework for Semi-Autonomous Robots Based on Interaction Primitives with Phase Estimation
Journal/Conference/Book TitleProceedings of the International Symposium of Robotics Research (ISRR)
Keywords3rd-Hand, BIMROB
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/PubGJMaeda/ISRR_uploaded_20150814_small.pdf
Reference TypeConference Proceedings
Author(s)Koc, O.; Maeda, G.; Neumann, G.; Peters, J.
Year2015
TitleOptimizing Robot Striking Movement Primitives with Iterative Learning Control
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Keywords3rd-hand
Reference TypeConference Proceedings
Author(s)Hoelscher, J.; Peters, J.; Hermans, T.
Year2015
TitleEvaluation of Interactive Object Recognition with Tactile Sensing
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsTACMAN, tactile manipulation
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Theses/hoelscher_ichr2015.pdf
Reference TypeConference Proceedings
Author(s)van Hoof, H.; Hermans, T.; Neumann, G.; Peters, J.
Year2015
TitleLearning Robot In-Hand Manipulation with Tactile Features
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsTACMAN, tactile manipulation
URL(s) http://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/HoofHumanoids2015.pdf
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/HoofHumanoids2015.pdf
Reference TypeConference Proceedings
Author(s)Leischnig, S.; Luettgen, S.; Kroemer, O.; Peters, J.
Year2015
TitleA Comparison of Contact Distribution Representations for Learning to Predict Object Interactions
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsTACMAN, tactile manipulation
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Site/EditPublication/Leischnig-Humanoids-2015.pdf
Reference TypeConference Proceedings
Author(s)Abdolmaleki, A.; Lioutikov, R.; Peters, J; Lau, N.; Reis, L.; Neumann, G.
Year2015
TitleModel-Based Relative Entropy Stochastic Search
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
KeywordsLearnRobotS
Place PublishedCambridge, MA
PublisherMIT Press
Link to PDFhttp://www.ausy.tu-darmstadt.de/uploads/Team/GerhardNeumann/Abdolmaleki_NIPS2015.pdf
Reference TypeConference Proceedings
Author(s)Dann, C.; Neumann, G.; Peters, J.
Year2015
TitlePolicy Evaluation with Temporal Differences: A Survey and Comparison
Journal/Conference/Book TitleProceedings of the Twenty-Fifth International Conference on Automated Planning and Scheduling (ICAPS)
Pages359-360
Reference TypeJournal Article
Author(s)Lioutikov, R.; Paraschos, A.; Peters, J.; Neumann, G.
Year2014
TitleGeneralizing Movements with Information Theoretic Stochastic Optimal Control
Journal/Conference/Book TitleJournal of Aerospace Information Systems
Volume11
Number9
Pages579-595
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/lioutikov_2014_itsoc.pdf
Reference TypeJournal Article
Author(s)Neumann, G.; Daniel, C.; Paraschos, A.; Kupcsik, A.; Peters, J.
Year2014
TitleLearning Modular Policies for Robotics
Journal/Conference/Book TitleFrontiers in Computational Neuroscience
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/fncom-08-00062.pdf
Reference TypeConference Proceedings
Author(s)Nori, F.; Peters, J.; Padois, V.; Babic, J.; Mistry, M.; Ivaldi, S.
Year2014
TitleWhole-body motion in humans and humanoids
Journal/Conference/Book TitleProceedings of the Workshop on New Research Frontiers for Intelligent Autonomous Systems (NRF-IAS)
KeywordsCoDyCo
Pages81-92
URL(s) http://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/nori2014iascodyco.pdf
Reference TypeJournal Article
Author(s)Dann, C.; Neumann, G.; Peters, J.
Year2014
TitlePolicy Evaluation with Temporal Differences: A Survey and Comparison
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
KeywordsComPLACS
Volume15
NumberMarch
Pages809-883
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/dann14a.pdf
Reference TypeJournal Article
Author(s)Meyer, T.; Peters, J.; Zander, T.O.; Schoelkopf, B.; Grosse-Wentrup, M.
Year2014
TitlePredicting Motor Learning Performance from Electroencephalographic Data
Journal/Conference/Book TitleJournal of Neuroengineering and Rehabilitation
KeywordsTeam Athena-Minerva
Volume11
Number1
URL(s) http://www.ias.tu-darmstadt.de/uploads/Publications/Meyer_JNER_2013.pdf
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Meyer_JNER_2013.pdf
Reference TypeJournal Article
Author(s)Bocsi, B.; Csato, L.; Peters, J.
Year2014
TitleIndirect Robot Model Learning for Tracking Control
Journal/Conference/Book TitleAdvanced Robotics
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Bocsi_AR_2014.pdf
Reference TypeJournal Article
Author(s)Ben Amor, H.; Saxena, A.; Hudson, N.; Peters, J.
Year2014
TitleSpecial issue on autonomous grasping and manipulation
Journal/Conference/Book TitleAutonomous Robots (AURO)
Reference TypeJournal Article
Author(s)Deisenroth, M.P.; Fox, D.; Rasmussen, C.E.
Year2014
TitleGaussian Processes for Data-Efficient Learning in Robotics and Control
Journal/Conference/Book TitleIEEE Transactions on Pattern Analysis and Machine Intelligence
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/pami_final_w_appendix.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/pami_final_w_appendix.pdf
Reference TypeJournal Article
Author(s)Wierstra, D.; Schaul, T.; Glasmachers, T.; Sun, Y.; Peters, J.; Schmidhuber, J.
Year2014
TitleNatural Evolution Strategies
Journal/Conference/Book TitleJournal of Machine Learning Research (JMLR)
Volume15
NumberMarch
Pages949-980
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/wierstra14a.pdf
Reference TypeConference Proceedings
Author(s)Deisenroth, M.P.; Englert, P.; Peters, J.; Fox, D.
Year2014
TitleMulti-Task Policy Search for Robotics
Journal/Conference/Book TitleProceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Deisenroth_ICRA_2014.pdf
Reference TypeConference Proceedings
Author(s)Bischoff, B.; Nguyen-Tuong, D.; van Hoof, H. McHutchon, A.; Rasmussen, C.E.; Knoll, A.; Peters, J.; Deisenroth, M.P.
Year2014
TitlePolicy Search For Learning Robot Control Using Sparse Data
Journal/Conference/Book TitleProceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Bischoff_ICRA_2014.pdf
Reference TypeConference Proceedings
Author(s)Calandra, R.; Seyfarth, A.; Peters, J.; Deisenroth, M.P.
Year2014
TitleAn Experimental Comparison of Bayesian Optimization for Bipedal Locomotion
Journal/Conference/Book TitleProceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA)
KeywordsCoDyCo
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/Calandra_ICRA2014.pdf
Reference TypeBook
Author(s)Kober, J.; Peters, J.
Year2014
TitleLearning Motor Skills - From Algorithms to Robot Experiments
Journal/Conference/Book TitleSpringer Tracts in Advanced Robotics 97 (STAR Series), Springer
ISBN/ISSN978-3-319-03193-4
Reference TypeConference Proceedings
Author(s)Kroemer, O.; van Hoof, H.; Neumann, G.; Peters, J.
Year2014
TitleLearning to Predict Phases of Manipulation Tasks as Hidden States
Journal/Conference/Book TitleProceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA)
KeywordsTACMAN, 3rd-Hand
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_ICRA_2014.pdf
Reference TypeConference Proceedings
Author(s)Ben Amor, H.; Neumann, G.; Kamthe, S.; Kroemer, O.; Peters, J.
Year2014
TitleInteraction Primitives for Human-Robot Cooperation Tasks
Journal/Conference/Book TitleProceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA)
KeywordsCoDyCo, ComPLACS
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/icraHeniInteract.pdf
Reference TypeConference Proceedings
Author(s)Haji Ghassemi, N.; Deisenroth, M.P.
Year2014
TitleApproximate Inference for Long-Term Forecasting with Periodic Gaussian Processes
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Analytic_Long-Term_Forecasting.pdf
Reference TypeConference Proceedings
Author(s)Calandra, R.; Gopalan, N.; Seyfarth, A.; Peters, J.; Deisenroth, M.P.
Year2014
TitleBayesian Gait Optimization for Bipedal Locomotion
Journal/Conference/Book TitleProceedings of the 2014 Learning and Intelligent Optimization Conference (LION8)
KeywordsCoDyCo
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/Calandra_LION8.pdf
Reference TypeConference Proceedings
Author(s)Kamthe, S.; Peters, J.; Deisenroth, M.
Year2014
TitleMulti-modal filtering for non-linear estimation
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Link to PDFhttps://spiral.imperial.ac.uk:8443/bitstream/10044/1/12921/2/ICASSP_Final.pdf
Reference TypeConference Proceedings
Author(s)Manschitz, S.; Kober, J.; Gienger, M.; Peters, J.
Year2014
TitleLearning to Unscrew a Light Bulb from Demonstrations
Journal/Conference/Book TitleProceedings of ISR/ROBOTIK 2014
KeywordsHRI-Collaboration
Reference TypeJournal Article
Author(s)Muelling, K.; Boularias, A.; Schoelkopf, B.; Peters, J.
Year2014
TitleLearning Strategies in Table Tennis using Inverse Reinforcement Learning
Journal/Conference/Book TitleBiological Cybernetics
Volume108
Number5
Pages603-619
Custom 1DOI 10.1007/s00422-014-0599-1
Custom 2http://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/Muelling_BICY_2014.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Research/Overview/Learning_strategies_in_table_tennis_usin.pdf
Reference TypeJournal Article
Author(s)Saut, J.-P.; Ivaldi, S.; Sahbani, A.; Bidaud, P.
Year2014
TitleGrasping objects localized from uncertain point cloud data
Journal/Conference/Book TitleRobotics and Autonomous Systems
KeywordsCoDyCo
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/auro2013_final.pdf
Reference TypeConference Proceedings
Author(s)Lioutikov, R.; Kroemer, O.; Peters, J.; Maeda, G.
Year2014
TitleLearning Manipulation by Sequencing Motor Primitives with a Two-Armed Robot
Journal/Conference/Book TitleProceedings of the 13th International Conference on Intelligent Autonomous Systems (IAS)
Keywords3rd-Hand
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Member/PubRudolfLioutikov/lioutikov_ias13_conf.pdf
Reference TypeConference Proceedings
Author(s)Daniel, C.; Viering, M.; Metz, J.; Kroemer, O.; Peters, J.
Year2014
TitleActive Reward Learning
Journal/Conference/Book TitleProceedings of Robotics: Science & Systems (R:SS)
Keywordscomplacs
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Daniel_RSS_2014.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Peters, J.
Year2014
TitlePredicting Object Interactions from Contact Distributions
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
Keywords3rd-Hand, TACMAN, CoDyCo
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/KroemerIROS2014.pdf
Reference TypeConference Proceedings
Author(s)Chebotar, Y.; Kroemer, O.; Peters, J.
Year2014
TitleLearning Robot Tactile Sensing for Object Manipulation
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
Keywords3rd-Hand, TACMAN
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/ChebotarIROS2014.pdf
Reference TypeConference Proceedings
Author(s)Manschitz, S.; Kober, J.; Gienger, M.; Peters, J.
Year2014
TitleLearning to Sequence Movement Primitives from Demonstrations
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsHRI-Collaboration, Honda
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/ManschitzIROS2014.pdf
Reference TypeConference Proceedings
Author(s)Luck, K.S.; Neumann, G.; Berger, E.; Peters, J.; Ben Amor, H.
Year2014
TitleLatent Space Policy Search for Robotics
Journal/Conference/Book TitleProceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS)
KeywordsComplacs, codyco
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Luck_IROS_2014.pdf
Reference TypeJournal Article
Author(s)van Hoof, H.; Kroemer, O; Peters, J.
Year2014
TitleProbabilistic Segmentation and Targeted Exploration of Objects in Cluttered Environments
Journal/Conference/Book TitleIEEE Transactions on Robotics (TRo)
Volume30
Number5
Pages1198-1209
ISBN/ISSN1552-3098
URL(s) http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6870500&tag=1
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/hoof2014probabilistic.pdf
Reference TypeConference Proceedings
Author(s)Gomez, V.; Kappen, B; Peters, J.; Neumann, G
Year2014
TitlePolicy Search for Path Integral Control
Journal/Conference/Book TitleProceedings of the European Conference on Machine Learning (ECML)
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Gomez_ECML_2014.pdf
Reference TypeConference Proceedings
Author(s)Maeda, G.J.; Ewerton, M.; Lioutikov, R.; Amor, H.B.; Peters, J.; Neumann, G.
Year2014
TitleLearning Interaction for Collaborative Tasks with Probabilistic Movement Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Keywords3rd-Hand, CompLACS
Pages527--534
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/PubGJMaeda/maeda2014InteractionProMP_HUMANOIDS.pdf
Reference TypeConference Proceedings
Author(s)Brandl, S.; Kroemer, O.; Peters, J.
Year2014
TitleGeneralizing Pouring Actions Between Objects using Warped Parameters
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Keywords3rd-Hand
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/BrandlHumanoids2014Final.pdf
Reference TypeConference Proceedings
Author(s)Colome, A.; Neumann, G.; Peters, J.; Torras, C.
Year2014
TitleDimensionality Reduction for Probabilistic Movement Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Colome_Humanoids_2014.pdf
Reference TypeConference Proceedings
Author(s)Rueckert, E.; Mindt, M.; Peters, J.; Neumann, G.
Year2014
TitleRobust Policy Updates for Stochastic Optimal Control
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/AICOHumanoidsFinal.pdf
Reference TypeConference Proceedings
Author(s)Ivaldi, S.; Peters, J.; Padois, V.; Nori, F.
Year2014
TitleTools for simulating humanoid robot dynamics: a survey based on user feedback
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/ivaldi2014simulators.pdf
Reference TypeJournal Article
Author(s)Droniou, A.; Ivaldi, S.; Sigaud, O.
Year2014
TitleDeep unsupervised network for multimodal perception, representation and classification
Journal/Conference/Book TitleRobotics and Autonomous Systems
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Deep unsupervised network_2014.pdf
Reference TypeConference Proceedings
Author(s)Hermans, T.; Veiga, F.; Hoelscher, J.; van Hoof, H.; Peters, J.
Year2014
TitleDemonstration: Learning for Tactile Manipulation
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS/NeurIPS), Demonstration Track.
KeywordsTACMAN, tactile manipulation
AbstractTactile sensing affords robots the opportunity to dexterously manipulate objects in-hand without the need of strong object models and planning. Our demonstration focuses on learning for tactile, in-hand manipulation by robots. We address learning problems related to the control of objects in-hand, as well as perception problems encountered by a robot exploring its environment with a tactile sensor. We demonstrate applications for three specific learning problems: learning to detect slip for grasp stability, learning to reposition objects in-hand, and learning to identify objects and object properties through tactile exploration. We address the problem of learning to detect slip of grasped objects. We show that the robot can learn a detector for slip events which generalizes to novel objects. We leverage this slip detector to produce a feedback controller that can stabilize objects during grasping and manipulation. Our work compares a number of supervised learning approaches and feature representations in order to achieve reliable slip detection. Tactile sensors provide observations of high enough dimension to cause problems for traditional reinforcement learning methods. As such, we introduce a novel reinforcement learning (RL) algorithm which learns transition functions embedded in a reproducing kernel Hilbert space (RKHS). The resulting policy search algorithm provides robust policy updates which can efficiently deal with high-dimensional sensory input. We demonstrate the method on the problem of repositioning a grasped object in the hand. Finally, we present a method for learning to classify objects through tactile exploration. The robot collects data from a number of objects through various exploratory motions. The robot learns a classifier for each object to be used during exploration of its environment to detect objects in cluttered environments. Here again we compare a number of learning methods and features present in the literature and synthesize a method to best work in human environments the robot is likely to encounter. Users will be able to interact with a robot hand by giving it objects to grasp and attempting to remove these objects from the robot. The hand will also perform some basic in-hand manipulation tasks such as rolling the object between the fingers and rotating the object about a fixed grasp point. Users will also be able to interact with a touch sensor capable of classifying objects as well as semantic events such as slipping from a stable contact location.
Place PublishedCambridge, MA
PublisherMIT Press
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/TuckerHermans/learning_tactile_manipulation_demo.pdf
Reference TypeConference Proceedings
Author(s)Lioutikov, R.; Paraschos, A.; Peters, J.; Neumann, G.
Year2014
TitleSample-Based Information-Theoretic Stochastic Optimal Control
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
Keywords3rd-Hand
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Team/RudolfLioutikov/lioutikov_icra_2014.pdf
Reference TypeJournal Article
Author(s)Muelling, K.; Kober, J.; Kroemer, O.; Peters, J.
Year2013
TitleLearning to Select and Generalize Striking Movements in Robot Table Tennis
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
KeywordsGeRT
Volume32
Number3
Pages263-279
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Muelling_IJRR_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Muelling_IJRR_2013.pdf
Reference TypeConference Proceedings
Author(s)Daniel, C.; Neumann, G.; Kroemer, O.; Peters, J.
Year2013
TitleLearning Sequential Motor Tasks
Journal/Conference/Book TitleProceedings of 2013 IEEE International Conference on Robotics and Automation (ICRA)
KeywordsGeRT, CompLACS
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Daniel_ICRA_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Daniel_ICRA_2013.pdf
Reference TypeConference Proceedings
Author(s)Englert, P.; Paraschos, A.; Peters, J.; Deisenroth, M. P.
Year2013
TitleModel-based Imitation Learning by Probabilistic Trajectory Matching
Journal/Conference/Book TitleProceedings of 2013 IEEE International Conference on Robotics and Automation (ICRA)
AbstractOne of the most elegant ways of teaching new skills to robots is to provide demonstrations of a task and let the robot imitate this behavior. Such imitation learning is a non-trivial task: Different anatomies of robot and teacher, and reduced robustness towards changes in the control task are two major difficulties in imitation learning. We present an imitation-learning approach to efficiently learn a task from expert demonstrations. Instead of finding policies indirectly, either via state-action mappings (behavioral cloning), or cost function learning (inverse reinforcement learning), our goal is to find policies directly such that predicted trajectories match observed ones. To achieve this aim, we model the trajectory of the teacher and the predicted robot trajectory by means of probability distributions. We match these distributions by minimizing their Kullback-Leibler divergence. In this paper, we propose to learn probabilistic forward models to compute a probability distribution over trajectories. We compare our approach to model-based reinforcement learning methods with hand-crafted cost functions. Finally, we evaluate our method with experiments on a real compliant robot.
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Englert_ICRA_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Englert_ICRA_2013.pdf
Reference TypeConference Proceedings
Author(s)Gopalan, N.; Deisenroth, M. P.; Peters, J.
Year2013
TitleFeedback Error Learning for Rhythmic Motor Primitives
Journal/Conference/Book TitleProceedings of 2013 IEEE International Conference on Robotics and Automation (ICRA)
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Gopalan_ICRA_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Gopalan_ICRA_2013.pdf
Reference TypeJournal Article
Author(s)Wang, Z.; Muelling, K.; Deisenroth, M. P.; Ben Amor, H.; Vogt, D.; Schoelkopf, B.; Peters, J.
Year2013
TitleProbabilistic Movement Modeling for Intention Inference in Human-Robot Interaction
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Volume32
Number7
Pages841-858
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_IJRR_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_IJRR_2013.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Kober, J.; Muelling, K.; Kroemer, O.; Neumann, G.
Year2013
TitleTowards Robot Skill Learning: From Simple Skills to Table Tennis
Journal/Conference/Book TitleProceedings of the European Conference on Machine Learning (ECML), Nectar Track
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/peters_ECML_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/peters_ECML_2013.pdf
Reference TypeJournal Article
Author(s)Kober, J.; Bagnell, D.; Peters, J.
Year2013
TitleReinforcement Learning in Robotics: A Survey
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Volume32
Number11
Pages1238-1274
URL(s) http://www.ias.tu-darmstadt.de/uploads/Publications/Kober_IJRR_2013.pdf
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Kober_IJRR_2013.pdf
Reference TypeConference Proceedings
Author(s)Kupcsik, A.G.; Deisenroth, M.P.; Peters, J.; Neumann, G.
Year2013
TitleData-Efficient Generalization of Robot Skills with Contextual Policy Search
Journal/Conference/Book TitleProceedings of the National Conference on Artificial Intelligence (AAAI)
KeywordsGeRT, ComPLACS
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Kupcsik_AAAI_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Kupcsik_AAAI_2013.pdf
Reference TypeConference Proceedings
Author(s)Bocsi, B.; Csato, L.; Peters, J.
Year2013
TitleAlignment-based Transfer Learning for Robot Models
Journal/Conference/Book TitleProceedings of the 2013 International Joint Conference on Neural Networks (IJCNN)
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bocsi_IJCNN_2013.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bocsi_IJCNN_2013.pdf
Reference TypeConference Proceedings
Author(s)Daniel, C.; Neumann, G.; Peters, J.
Year2013
TitleAutonomous Reinforcement Learning with Hierarchical REPS
Journal/Conference/Book TitleProceedings of the 2013 International Joint Conference on Neural Networks (IJCNN)
KeywordsGeRT, CompLACS
Reference TypeJournal Article
Author(s)Englert, P.; Paraschos, A.; Peters, J.;Deisenroth, M.P.
Year2013
TitleProbabilistic Model-based Imitation Learning
Journal/Conference/Book TitleAdaptive Behavior Journal
Volume21
Pages388-403
URL(s) http://www.ias.tu-darmstadt.de/uploads/Publications/Englert_ABJ_2013.pdf
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Englert_ABJ_2013.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Kober, J.; Muelling, K.; Nguyen-Tuong, D.; Kroemer, O.
Year2013
TitleLearning Skills with Motor Primitives
Journal/Conference/Book TitleProceedings of the 16th Yale Learning Workshop
Reference TypeConference Proceedings
Author(s)Neumann, G.; Kupcsik, A.G.; Deisenroth, M.P.; Peters, J.
Year2013
TitleInformation-Theoretic Motor Skill Learning
Journal/Conference/Book TitleProceedings of the AAAI 2013 Workshop on Intelligent Robotic Systems
KeywordsComPLACS
Reference TypeConference Proceedings
Author(s)Ben Amor, H.; Vogt, D.; Ewerton, M.; Berger, E.; Jung, B.; Peters, J.
Year2013
TitleLearning Responsive Robot Behavior by Imitation
Journal/Conference/Book TitleProceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsCoDyCo
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/iros2013Heni.pdf
Reference TypeJournal Article
Author(s)Deisenroth, M. P.; Neumann, G.; Peters, J.
Year2013
TitleA Survey on Policy Search for Robotics
Journal/Conference/Book TitleFoundations and Trends in Robotics
KeywordsCompLACS
Volume21
Pages388-403
URL(s) http://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/PolicySearchReview.pdf
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Site/EditPublication/PolicySearchReview.pdf
Reference TypeConference Proceedings
Author(s)van Hoof, H.; Kroemer, O; Peters, J.
Year2013
TitleProbabilistic Interactive Segmentation for Anthropomorphic Robots in Cluttered Environments
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
Keywordsgert, complacs
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/hoof-HUMANOIDS.pdf
Reference TypeConference Proceedings
Author(s)Paraschos, A.; Neumann, G; Peters, J.
Year2013
TitleA Probabilistic Approach to Robot Trajectory Generation
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo, ComPLACS
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Paraschos_Humanoids_2013.pdf
Reference TypeConference Proceedings
Author(s)Berger, E.; Vogt, D.; Haji-Ghassemi, N.; Jung, B.; Ben Amor, H.
Year2013
TitleInferring Guidance Information in Cooperative Human-Robot Tasks
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsCoDyCo
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/humanoids2013Heni.pdf
Reference TypeConference Proceedings
Author(s)Paraschos, A.; Daniel, C.; Peters, J.; Neumann, G
Year2013
TitleProbabilistic Movement Primitives
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS / NeurIPS)
KeywordsCoDyCo, ComPLACS
Place PublishedCambridge, MA
PublisherMIT Press
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Paraschos_NIPS_2013a.pdf
Reference TypeBook Section
Author(s)Sigaud, O.; Peters, J.
Year2012
TitleRobot Learning
Journal/Conference/Book TitleEncyclopedia of the Sciences of Learning, Springer Verlag
PublisherSpringer Verlag
Reprint Edition978-1-4419-1428-6
URL(s) http://dx.doi.org/10.1007/978-3-642-05181-4_1
Reference TypeJournal Article
Author(s)Lampert, C.H.; Peters, J.
Year2012
TitleReal-Time Detection of Colored Objects In Multiple Camera Streams With Off-the-Shelf Hardware Components
Journal/Conference/Book TitleJournal of Real-Time Image Processing
Volume7
Number1
Pages31-41
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/rtblob-jrtip2010_6651[0].pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/rtblob-jrtip2010_6651[0].pdf
Reference TypeConference Proceedings
Author(s)Daniel, C.; Neumann, G.; Peters, J.
Year2012
TitleHierarchical Relative Entropy Policy Search
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2012)
KeywordsGeRT, CompLACS
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Member/ChristianDaniel/DanielAISTATS2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Member/ChristianDaniel/DanielAISTATS2012.pdf
Reference TypeJournal Article
Author(s)Deisenroth, M.P.; Turner, R.; Huber, M.; Hanebeck, U.D.; Rasmussen, C.E
Year2012
TitleRobust Filtering and Smoothing with Gaussian Processes
Journal/Conference/Book TitleIEEE Transactions on Automatic Control
KeywordsGaussian process, filtering, smoothing
AbstractWe propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of "system identification" is more robust than finding point estimates of a parametric function representation. Our principled filteringslash smoothing approach for GP dynamic systems is based on analytic moment matching in the context of the forward-backward algorithm. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail.
Number of VolumesIEEE
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/deisenroth_IEEE-TAC2012.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Ugur, E.; Oztop, E. ; Peters, J.
Year2012
TitleA Kernel-based Approach to Direct Action Perception
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_ICRA_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_ICRA_2012.pdf
Reference TypeConference Proceedings
Author(s)Bocsi, B.; Hennig, P.; Csato, L.; Peters, J.
Year2012
TitleLearning Tracking Control with Forward Models
Journal/Conference/Book TitleProceedings of the International Conference on Robotics and Automation (ICRA)
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bocsi_ICRA_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bocsi_ICRA_2012.pdf
Reference TypeJournal Article
Author(s)Kober, J.; Wilhelm, A.; Oztop, E.; Peters, J.
Year2012
TitleReinforcement Learning to Adjust Parametrized Motor Primitives to New Situations
Journal/Conference/Book TitleAutonomous Robots (AURO)
KeywordsSkill learning; Motor primitives; Reinforcement learning; Meta-parameters; Policy learning
PublisherSpringer US
Volume33
Number4
Pages361-379
ISBN/ISSN0929-5593
URL(s) http://dx.doi.org/10.1007/s10514-012-9290-3
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/kober_auro2012.pdf
LanguageEnglish
Reference TypeJournal Article
Author(s)Vitzthum, A.; Ben Amor, H.; Heumer, G.; Jung, B.
Year2012
TitleXSAMPL3D - An Action Description Language for the Animation of Virtual Characters
Journal/Conference/Book TitleJournal of Virtual Reality and Broadcasting
Volume9
Number1
URL(s) http://www.jvrb.org/9.2012
Link to PDFhttp://www.jvrb.org/9.2012/3262/920121.pdf
Reference TypeConference Proceedings
Author(s)Wang, Z.;Deisenroth, M; Ben Amor, H.; Vogt, D.; Schoelkopf, B.; Peters, J.
Year2012
TitleProbabilistic Modeling of Human Movements for Intention Inference
Journal/Conference/Book TitleProceedings of Robotics: Science and Systems (R:SS)
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_RSS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_RSS_2012.pdf
Reference TypeJournal Article
Author(s)Nguyen-Tuong, D.; Peters, J.
Year2012
TitleOnline Kernel-based Learning for Task-Space Tracking Robot Control
Journal/Conference/Book TitleIEEE Transactions on Neural Networks and Learning Systems
Volume23
Number9
Pages1417-1425
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/NguyenTuong_TNN_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/NguyenTuong_TNN_2012.pdf
Reference TypeConference Proceedings
Author(s)Deisenroth, M.P.; Mohamed, S.
Year2012
TitleExpectation Propagation in Gaussian Process Dynamical Systems
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 26 (NIPS/NeurIPS), Cambridge, MA: MIT Press.
AbstractRich and complex time-series data, such as those generated from engineering systems, financial markets, videos, or neural recordings are now a common feature of modern data analysis. Explaining the phenomena underlying these diverse data sets requires flexible and accurate models. In this paper, we promote Gaussian process dynamical systems as a rich model class that is appropriate for such an analysis. We present a new approximate message-passing algorithm for Bayesian state estimation and inference in Gaussian process dynamical systems, a non-parametric probabilistic generalization of commonly used state-space models. We derive our message-passing algorithm using Expectation Propagation provide a unifying perspective on message passing in general state-space models. We show that existing Gaussian filters and smoothers appear as special cases within our inference framework, and that these existing approaches can be improved upon using iterated message passing. Using both synthetic and real-world data, we demonstrate that iterated message passing can improve inference in a wide range of tasks in Bayesian state estimation, thus leading to improved predictions and more effective decision making.
PublisherThe MIT Press
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_NIPS_2012.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Kober, J.; Muelling, K.; Nguyen-Tuong, D.; Kroemer, O.
Year2012
TitleRobot Skill Learning
Journal/Conference/Book TitleProceedings of the European Conference on Artificial Intelligence (ECAI)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/peters_ECAI2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/peters_ECAI2012.pdf
Reference TypeConference Proceedings
Author(s)Boularias, A.; Kroemer, O.; Peters, J.
Year2012
TitleStructured Apprenticeship Learning
Journal/Conference/Book TitleProceedings of the European Conference on Machine Learning (ECML)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Boularias_ECML_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Boularias_ECML_2012.pdf
Reference TypeConference Proceedings
Author(s)Meyer, T.; Peters, J.;Broetz, D.; Zander, T.; Schoelkopf, B.; Soekadar, S.; Grosse-Wentrup, M.
Year2012
TitleA Brain-Robot Interface for Studying Motor Learning after Stroke
Journal/Conference/Book TitleProceedings of the International Conference on Robot Systems (IROS)
KeywordsTeam Athena-Minerva
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Meyer_IROS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Meyer_IROS_2012.pdf
Reference TypeConference Proceedings
Author(s)Deisenroth, M.P.; Calandra, R.; Seyfarth, A.; Peters, J.
Year2012
TitleToward Fast Policy Search for Learning Legged Locomotion
Journal/Conference/Book TitleProceedings of the International Conference on Robot Systems (IROS)
Keywordslegged locomotion, policy search, reinforcement learning, Gaussian process
AbstractLegged locomotion is one of the most versatile forms of mobility. However, despite the importance of legged locomotion and the large number of legged robotics studies, no biped or quadruped matches the agility and versatility of their biological counterparts to date. Approaches to designing controllers for legged locomotion systems are often based on either the assumption of perfectly known dynamics or mechanical designs that substantially reduce the dimensionality of the problem. The few existing approaches for learning controllers for legged systems either require exhaustive real-world data or they improve controllers only conservatively, leading to slow learning. We present a data-efficient approach to learning feedback controllers for legged locomotive systems, based on learned probabilistic forward models for generating walking policies. On a compass walker, we show that our approach allows for learning gait policies from very little data. Moreover, we analyze learned locomotion models of a biomechanically inspired biped. Our approach has the potential to scale to high-dimensional humanoid robots with little loss in efficiency.
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_IROS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_IROS_2012.pdf
Reference TypeConference Proceedings
Author(s)van Hoof, H.; Kroemer, O.;Ben Amor, H.; Peters, J.
Year2012
TitleMaximally Informative Interaction Learning for Scene Exploration
Journal/Conference/Book TitleProceedings of the International Conference on Robot Systems (IROS)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/VanHoof_IROS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/VanHoof_IROS_2012.pdf
Reference TypeConference Proceedings
Author(s)Daniel, C.; Neumann, G.; Peters, J.
Year2012
TitleLearning Concurrent Motor Skills in Versatile Solution Spaces
Journal/Conference/Book TitleProceedings of the International Conference on Robot Systems (IROS)
KeywordsGeRT, CompLACS
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Daniel_IROS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Daniel_IROS_2012.pdf
Reference TypeConference Proceedings
Author(s)Ben Amor, H.; Kroemer, O.; Hillenbrand, U.; Neumann, G.; Peters, J.
Year2012
TitleGeneralization of Human Grasping for Multi-Fingered Robot Hands
Journal/Conference/Book TitleProceedings of the International Conference on Robot Systems (IROS)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/BenAmor_IROS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/BenAmor_IROS_2012.pdf
Reference TypeConference Proceedings
Author(s)Kober, J; Muelling, K.; Peters, J.
Year2012
TitleLearning Throwing and Catching Skills
Journal/Conference/Book TitleProceedings of the International Conference on Robot Systems (IROS), Video Track
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Kober_IROS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Kober_IROS_2012.pdf
Reference TypeConference Proceedings
Author(s)Deisenroth, M.P.; Peters, J.
Year2012
TitleSolving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise
Journal/Conference/Book TitleProceedings of the European Workshop on Reinforcement Learning (EWRL)
Link to PDFhttp://www.ias.tu-darmstadt.de/uploads/Publications/Deisenroth_EWRL_2012.pdf
Reference TypeConference Proceedings
Author(s)Muelling, K.; Kober, J.; Kroemer, O.; Peters, J.
Year2012
TitleLearning to Select and Generalize Striking Movements in Robot Table Tennis
Journal/Conference/Book TitleProceedings of the AAAI 2012 Fall Symposium on Robots that Learn Interactively from Human Teachers
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/aaaifss12rliht_submission_2.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/aaaifss12rliht_submission_2.pdf
Reference TypeConference Proceedings
Author(s)Calandra, R.; Raiko, T.; Deisenroth, M.P.; Montesino Pouzols, F.
Year2012
TitleLearning Deep Belief Networks from Non-Stationary Streams
Journal/Conference/Book TitleInternational Conference on Artificial Neural Networks (ICANN)
Keywordsdeep learning, non-stationary data
AbstractDeep learning has proven to be beneficial for complex tasks such as classifying images. However, this approach has been mostly ap- plied to static datasets. The analysis of non-stationary (e.g., concept drift) streams of data involves specific issues connected with the tempo- ral and changing nature of the data. In this paper, we propose a proof- of-concept method, called Adaptive Deep Belief Networks, of how deep learning can be generalized to learn online from changing streams of data. We do so by exploiting the generative properties of the model to incrementally re-train the Deep Belief Network whenever new data are collected. This approach eliminates the need to store past observations and, therefore, requires only constant memory consumption. Hence, our approach can be valuable for life-long learning from non-stationary data streams.
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Site/EditPublication/calandra_icann2012.pdf
Reference TypeConference Proceedings
Author(s)Meyer, T.; Peters, J.; Broetz, D.; Zander, T.; Schoelkopf, B.; Soekadar, S.; Grosse-Wentrup, M.
Year2012
TitleInvestigating the Neural Basis for Stroke Rehabilitation by Brain-Computer Interfaces
Journal/Conference/Book TitleInternational Conference on Neurorehabilitation
KeywordsTeam Athena-Minerva
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Ben Amor, H.; Ewerton, M.; Peters, J.
Year2012
TitlePoint Cloud Completion Using Extrusions
Journal/Conference/Book TitleProceedings of the International Conference on Humanoid Robots (HUMANOIDS)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_Humanoids_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_Humanoids_2012.pdf
Reference TypeConference Proceedings
Author(s)Boularias, A.; Kroemer, O.; Peters, J.
Year2012
TitleAlgorithms for Learning Markov Field Policies
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 26 (NIPS/NeurIPS), Cambridge, MA: MIT Press.
KeywordsGeRT
Place PublishedCambridge, MA
PublisherMIT Press
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Boularias_NIPS_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Boularias_NIPS_2012.pdf
Reference TypeBook
Author(s)Deisenroth M. P.; Szepesvari C.; Peters J.
Year2012
Journal/Conference/Book TitleProceedings of the 10th European Workshop on Reinforcement Learning
Editor(s)Deisenroth M. P.; Szepesvari C., Peters J.
Place PublishedJMLR W&C
Volume24
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_EWRLproceedings_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_EWRLproceedings_2012.pdf
Reference TypeJournal Article
Author(s)Piater, J.; Jodogne, S.; Detry, R.; Kraft, D.; Krueger, N.; Kroemer, O.; Peters, J.
Year2011
TitleLearning Visual Representations for Perception-Action Systems
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Volume30
Number3
Pages294-307
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Piater_IJRR_2010.pdf
Reference TypeJournal Article
Author(s)Detry, R.; Kraft, D.; Kroemer, O.; Peters, J.; Krueger, N.; Piater, J.;
Year2011
TitleLearning Grasp Affordance Densities
Journal/Conference/Book TitlePaladyn Journal of Behavioral Robotics
KeywordsGeRT
Volume2
Number1
Pages1-17
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Detry_PJBR_2011.pdf
Reference TypeJournal Article
Author(s)Kober, J.; Peters, J.
Year2011
TitlePolicy Search for Motor Primitives in Robotics
Journal/Conference/Book TitleMachine Learning (MLJ)
Volume84
Number1-2
Pages171-203
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/kober_MACH_2011.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/kober_MACH_2011.pdf
Reference TypeJournal Article
Author(s)Nguyen Tuong, D.; Peters, J.
Year2011
TitleIncremental Sparsification for Real-time Online Model Learning
Journal/Conference/Book TitleNeurocomputing
Volume74
Number11
Pages1859-1867
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Nguyen_NEURO_2011.pdf
Reference TypeJournal Article
Author(s)Gomez Rodriguez, M.; Peters, J.; Hill, J.; Schoelkopf, B.; Gharabaghi, A.; Grosse-Wentrup, M.
Year2011
TitleClosing the Sensorimotor Loop: Haptic Feedback Helps Decoding of Motor Imagery
Journal/Conference/Book TitleJournal of Neuroengineering
KeywordsTeam Athena-Minerva
Volume8
Number3
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Gomez-RodriguezJNE2011.pdf
Reference TypeConference Proceedings
Author(s)Lampariello, R.; Nguyen Tuong, D.; Castellini, C.; Hirzinger, G.; Peters, J.
Year2011
TitleEnergy-optimal robot catching in real-time
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Lampariello_ICRA_2011.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Peters, J.
Year2011
TitleA Flexible Hybrid Framework for Modeling Complex Manipulation Tasks
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
KeywordsGeRT
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_ICRA_2011.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Peters, J.
Year2011
TitleActive Exploration for Robot Parameter Selection in Episodic Reinforcement Learning
Journal/Conference/Book TitleProceedings of the 2011 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL)
KeywordsGeRT
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_ADPRL_2011.pdf
Reference TypeJournal Article
Author(s)Kroemer, O.; Lampert, C.H.; Peters, J.
Year2011
TitleLearning Dynamic Tactile Sensing with Robust Vision-based Training
Journal/Conference/Book TitleIEEE Transactions on Robotics (T-Ro)
Volume27
Number3
Pages545-557
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer_TRo_2011.pdf
Reference TypeConference Proceedings
Author(s)Boularias, A.; Kroemer, O.; Peters, J.
Year2011
TitleLearning Robot Grasping from 3D Images with Markov Random Fields
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robot Systems (IROS)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Publications/Boularias_IROS_2011.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Boularias_IROS_2011.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Peters, J.
Year2011
TitleA Non-Parametric Approach to Dynamic Programming
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 25 (NIPS/NeurIPS)
KeywordsGeRT
Place PublishedCambridge, MA
PublisherMIT Press
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Kroemer2011NIPS.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kroemer2011NIPS.pdf
Reference TypeConference Proceedings
Author(s)van Hoof, H.; van der Zant, T. ; Wiering, M.A.
Year2011
TitleAdaptive Visual Face Tracking for an Autonomous Robot
Journal/Conference/Book TitleProceedings of the Belgian-Dutch Artificial Intelligence Conference (BNAIC 11)
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/VanHoof_BNAIC_2011.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/VanHoof_BNAIC_2011.pdf
Reference TypeJournal Article
Author(s)Muelling, K.; Kober, J.; Peters, J.
Year2011
TitleA Biomimetic Approach to Robot Table Tennis
Journal/Conference/Book TitleAdaptive Behavior Journal
Volume19
Number5
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Muelling_ABJ2011.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Muelling_ABJ2011.pdf
Reference TypeConference Proceedings
Author(s)Bocsi, B.; Nguyen-Tuong, D; Csato, L; Schoelkopf, B.; Peters, J.
Year2011
TitleLearning Inverse Kinematics with Structured Prediction
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robot Systems (IROS)
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bocsi_IROS_2011.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Bocsi_IROS_2011.pdf
Reference TypeConference Proceedings
Author(s)Wang, Z.; Lampert, C; Muelling, K; Schoelkopf, B.; Peters, J.
Year2011
TitleLearning Anticipation Policies for Robot Table Tennis
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robot Systems (IROS)
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_IROS_2011.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_IROS_2011.pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Peters, J.
Year2011
TitleLearning Task-Space Tracking Control with Kernels
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robot Systems (IROS)
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Nguyen_IROS_2011.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Nguyen_IROS_2011.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Peters, J.
Year2011
TitleLearning Elementary Movements Jointly with a Higher Level Task
Journal/Conference/Book TitleIEEE/RSJ International Conference on Intelligent Robot Systems (IROS)
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Kober_IROS_2011.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kober_IROS_2011.pdf
Reference TypeConference Proceedings
Author(s)Gomez Rodriguez, M.; Grosse-Wentrup, M.; Hill, J.; Schoelkopf, B.; Gharabaghi, A.; Peters, J.
Year2011
TitleTowards Brain-Robot Interfaces for Stroke Rehabilitation
Journal/Conference/Book TitleProceedings of the International Conference on Rehabilitation Robotics (ICORR)
KeywordsTeam Athena-Minerva
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Gomez_ICORR_2011.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Gomez_ICORR_2011.pdf
Reference TypeConference Proceedings
Author(s)Wang, Z.; Boularias, A.; Muelling, K.; Peters, J.
Year2011
TitleBalancing Safety and Exploitability in Opponent Modeling
Journal/Conference/Book TitleProceedings of the Twenty-Fifth National Conference on Artificial Intelligence (AAAI)
KeywordsGeRT
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Wang_AAAI_2011.pdf
Reference TypeJournal Article
Author(s)Hachiya, H.; Peters, J.; Sugiyama, M.
Year2011
TitleReward Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Journal/Conference/Book TitleNeural Computation
Volume23
Number11
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Hachiya_NC2011.pdf
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Hachiya_NC2011.pdf
Reference TypeJournal Article
Author(s)Nguyen Tuong, D.; Peters, J.
Year2011
TitleModel Learning in Robotics: a Survey
Journal/Conference/Book TitleCognitive Processing
Volume12
Number4
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Nguyen_CP_2011.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Oztop, E.; Peters, J.
Year2011
TitleReinforcement Learning to adjust Robot Movements to New Situations
Journal/Conference/Book TitleProceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Best Paper Track
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Kober_IJCAI_2011.pdf
Reference TypeConference Proceedings
Author(s)Boularias, A.; Kober, J.; Peters, J.
Year2011
TitleRelative Entropy Inverse Reinforcement Learning
Journal/Conference/Book TitleProceedings of Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011)
KeywordsGeRT
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/boularias11a.pdf
Reference TypeJournal Article
Author(s)Wierstra, D.; Foerster, A.; Peters, J.; Schmidhuber, J.
Year2010
TitleRecurrent Policy Gradients
Journal/Conference/Book TitleLogic Journal of the IGPL
Volume18
Number of Volumes5
Pages620-634
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/jzp049v1_5879.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Oztop, E.; Peters, J.
Year2010
TitleReinforcement Learning to adjust Robot Movements to New Situations
Journal/Conference/Book TitleProceedings of Robotics: Science and Systems (R:SS)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/RSS2010-Kober_6438[0].pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Detry, R.; Piater, J.; Peters, J.
Year2010
TitleAdapting Preshaped Grasping Movements using Vision Descriptors
Journal/Conference/Book Title From Animals to Animats 11, International Conference on the Simulation of Adaptive Behavior (SAB)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/SAB2010-Kroemer_6437[0].pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/SAB2010-Kroemer_6437[0].pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Detry, R.; Piater, J.; Peters, J.
Year2010
TitleGrasping with Vision Descriptors and Motor Primitives
Journal/Conference/Book TitleProceedings of the International Conference on Informatics in Control, Automation and Robotics (ICINCO)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/ICINCO2010-Kroemer_6436[0].pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/ICINCO2010-Kroemer_6436[0].pdf
Reference TypeConference Proceedings
Author(s)Muelling, K.; Kober, J.; Peters, J.
Year2010
TitleSimulating Human Table Tennis with a Biomimetic Robot Setup
Journal/Conference/Book TitleFrom Animals to Animats 11, International Conference on the Simulation of Adaptive Behavior (SAB)
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/SAB2010-Muelling_6626[0].pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/SAB2010-Muelling_6626[0].pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Peters, J.
Year2010
TitleIncremental Sparsification for Real-time Online Model Learning
Journal/Conference/Book TitleProceedings of Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/AISTATS2010-Nguyen-Tuong.pdf
Reference TypeJournal Article
Author(s)Kober, J.; Peters, J.
Year2010
TitleImitation and Reinforcement Learning - Practical Algorithms for Motor Primitive Learning in Robotics
Journal/Conference/Book TitleIEEE Robotics and Automation Magazine
Volume17
Number2
Pages55-62
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/kober_RAM_2010.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/kober_RAM_2010.pdf
Reference TypeJournal Article
Author(s)Kroemer, O.; Detry, R.; Piater, J.; Peters, J.
Year2010
TitleCombining Active Learning and Reactive Control for Robot Grasping
Journal/Conference/Book TitleRobotics and Autonomous Systems
KeywordsGeRT
Volume58
Number9
Pages1105-1116
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/KroemerJRAS_6636[0].pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/KroemerJRAS_6636[0].pdf
Reference TypeBook Section
Author(s)Nguyen Tuong, D.; Peters, J.;Seeger, M.
Year2010
TitleReal-Time Local GP Model Learning
Journal/Conference/Book TitleFrom Motor Learning to Interaction Learning in Robots, Springer Verlag
Number264
Reprint Edition978-3-642-05180-7
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/LGP_IROS_Chapter_6233.pdf
Reference TypeBook Section
Author(s)Peters, J.; Tedrake, R.; Roy, N.; Morimoto, J.
Year2010
TitleRobot Learning
Journal/Conference/Book TitleEncyclopedia of Machine Learning
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/EncyclopediaMachineLearning-Peters-RobotLearning_[0].pdf
Reference TypeBook
Author(s)Sigaud, O.; Peters, J.
Year2010
TitleFrom Motor Learning to Interaction Learning in Robots
Journal/Conference/Book TitleStudies in Computational Intelligence, Springer Verlag
Number of VolumesSpringer V
Number264
Reprint Edition978-3-642-05180-7
Link to PDFhttp://dx.doi.org/10.1007/978-3-642-05181-4
Reference TypeBook Section
Author(s)Kober, J.; Mohler, B.; Peters, J.
Year2010
TitleImitation and Reinforcement Learning for Motor Primitives with Perceptual Coupling
Journal/Conference/Book TitleFrom Motor Learning to Interaction Learning in Robots, Springer Verlag
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Imitation%20and%20Reinforcement%20Learning%20for%20Motor%20Primitives%20with%20Perceptual%20Coupling_6234[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Muelling, K.; Altun, Y.
Year2010
TitleRelative Entropy Policy Search
Journal/Conference/Book TitleProceedings of the Twenty-Fourth National Conference on Artificial Intelligence (AAAI), Physically Grounded AI Track
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Team/JanPeters/Peters2010_REPS.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Muelling, K.; Kroemer, O.; Lampert, C.H.; Schoelkopf, B.; Peters, J.
Year2010
TitleMovement Templates for Learning of Hitting and Batting
Journal/Conference/Book TitleIEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/ICRA2010-Kober_6231[1].pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Peters, J.
Year2010
TitleUsing Model Knowledge for Learning Inverse Dynamics
Journal/Conference/Book TitleIEEE International Conference on Robotics and Automation
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICRA2010-NguyenTuong_6232.pdf
Reference TypeJournal Article
Author(s)Sehnke, F.; Osendorfer, C.; Rueckstiess, T.; Graves, A.; Peters, J.; Schmidhuber, J.
Year2010
TitleParameter-exploring Policy Gradients
Journal/Conference/Book TitleNeural Networks
Volume23
Number of Volumes4
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Neural-Networks-2010-Sehnke.pdf
Reference TypeBook Section
Author(s)Peters, J.; Bagnell, J.A.
Year2010
TitlePolicy gradient methods
Journal/Conference/Book TitleEncyclopedia of Machine Learning (invited article)
Number of VolumesSpringer V
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Peters_EOMLA_submitted_6074[0].pdf
Reference TypeJournal Article
Author(s)Morimura, T.; Uchibe, E.; Yoshimoto, J.; Peters, J.; Doya, K.
Year2010
TitleDerivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning
Journal/Conference/Book TitleNeural Computation
Volume22
Number2
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/LSD_revise_ver3_5904[0].pdf
Reference TypeBook Section
Author(s)Detry, R.; Baseski, E.; Popovic, M.; Touati, Y.; Krueger, N.; Kroemer, O.; Peters, J.; Piater, J.
Year2010
TitleLearning Continuous Grasp Affordances by Sensorimotor Exploration
Journal/Conference/Book TitleFrom Motor Learning to Interaction Learning in Robots, Springer Verlag
Number264
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Detry-2010-MotorInteractionLearning_[0].pdf
Reference TypeConference Proceedings
Author(s)Erkan, A.: Kroemer, O.; Detry, R.; Altun, Y.; Piater, J.; Peters, J.
Year2010
TitleLearning Probabilistic Discriminative Models of Grasp Affordances under Limited Supervision
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
KeywordsGeRT
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/erkan_IROS_2010.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/erkan_IROS_2010.pdf
Reference TypeConference Proceedings
Author(s)Muelling, K.; Kober, J.; Peters, J.
Year2010
TitleA Biomimetic Approach to Robot Table Tennis
Journal/Conference/Book TitleProceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/Muelling_ABJ2011.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/Muelling_ABJ2011.pdf
Reference TypeConference Proceedings
Author(s)Gomez Rodriguez, M.; Grosse Wentrup, M.; Peters, J.; Naros, G.; Hill, J.; Gharabaghi, A.; Schoelkopf, B.
Year2010
TitleEpidural ECoG Online Decoding of Arm Movement Intention in Hemiparesis
Journal/Conference/Book Title1st ICPR Workshop on Brain Decoding: Pattern Recognition Challenges in Neuroimaging
KeywordsTeam Athena-Minerva
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICPR-WBD-2010-Gomez-Rodriguez.pdf
Reference TypeConference Proceedings
Author(s)Gomez Rodriguez, M.; Peters, J.; Hill, J.; Schoelkopf, B.; Gharabaghi, A.; Grosse-Wentrup, M.
Year2010
TitleClosing the Sensorimotor Loop: Haptic Feedback Facilitates Decoding of Arm Movement Imagery
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Systems, Man, and Cybernetics (Workshop on Brain-Machine Interfaces)
KeywordsTeam Athena-Minerva
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/eeg-smc2010_6591.pdf
Reference TypeConference Proceedings
Author(s)Gomez Rodriguez, M.; Peters, J.; Hill, J.; Gharabaghi, A.; Schoelkopf, B.; Grosse-Wentrup, M.
Year2010
TitleBCI and robotics framework for stroke rehabilitation
Journal/Conference/Book TitleProceedings of the 4th International BCI Meeting, May 31 - June 4, 2010. Asilomar, CA, USA
KeywordsTeam Athena-Minerva
Link to PDFhttp://bcimeeting.org/2010/
Reference TypeConference Proceedings
Author(s)Lampert, C. H.; Kroemer, O.
Year2010
TitleWeakly-Paired Maximum Covariance Analysis for Multimodal Dimensionality Reduction and Transfer Learning
Journal/Conference/Book TitleProceedings of the 11th European Conference on Computer Vision (ECCV 2010)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/lampert-eccv2010.pdf
Reference TypeConference Proceedings
Author(s)Chiappa, S.; Peters, J.
Year2010
TitleMovement extraction by detecting dynamics switches and repetitions
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 24 (NIPS/NeurIPS), Cambridge, MA: MIT Press
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Chiappa_NIPS_2011.pdf
Reference TypeConference Proceedings
Author(s)Alvaro, M.; Peters, J.; Schoelfkopf, B.; Lawrence, N.
Year2010
TitleSwitched Latent Force Models for Movement Segmentation
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 24 (NIPS/NeurIPS), Cambridge, MA: MIT Press
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Alvarez_NIPS_2011.pdf
Reference TypeJournal Article
Author(s)Peters, J.;Kober, J.;Schaal, S.
Year2010
TitlePolicy learning algorithmis for motor learning (Algorithmen zum automatischen Erlernen von Motorfaehigkigkeiten)
Journal/Conference/Book TitleAutomatisierungstechnik
Keywordsreinforcement leanring, motor skills
AbstractRobot learning methods which allow au- tonomous robots to adapt to novel situations have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to ful- fill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics. If possible, scaling was usually only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general ap- proach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human- like performance. For doing so, we study two major components for such an approach, i. e., firstly, we study policy learning algo- rithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structu- res for task representation and execution.
Volume58
Number12
Pages688-694
Short TitlePolicy learning algorithmis for motor learning (Algorithmen zum automatischen Erlernen von Motorfähigkigkeiten)
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/at-Automatisierungstechnik-Algorithmen_zum_Automatischen_Erlernen_von_Motorfhigkeiten
Reference TypeConference Proceedings
Author(s)Muelling, K.; Kober, J.; Peters, J.
Year2010
TitleLearning Table Tennis with a Mixture of Motor Primitives
Journal/Conference/Book Title10th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2010)
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Muelling_ICHR_2012.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Muelling_ICHR_2012.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Muelling, K.; Kober, J.
Year2010
TitleExperiments with Motor Primitives to learn Table Tennis
Journal/Conference/Book Title12th International Symposium on Experimental Robotics (ISER 2010)
Reference TypeConference Proceedings
Author(s)Hachiya, H.; Peters, J.; Sugiyama, M.
Year2009
TitleEfficient Sample Reuse in EM-based Policy Search
Journal/Conference/Book TitleProceedings of the 16th European Conference on Machine Learning (ECML 2009)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ECML-PKDD-2009-Hachiya_6068[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Kober, J.; Muelling, K.; Nguyen-Tuong, D.; Kroemer, O.
Year2009
TitleTowards Motor Skill Learning for Robotics
Journal/Conference/Book TitleProceedings of the International Symposium on Robotics Research (ISRR), Invited Paper
AbstractLearning robots that can acquire new motor skills and refine existing one has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early steps towards this goal in the 1980s made clear that reasoning and human insights will not suffice. Instead, new hope has been offered by the rise of modern machine learning approaches. However, to date, it becomes increasingly clear that off-the-shelf machine learning approaches will not suffice for motor skill learning as these methods often do not scale into the high-dimensional domains of manipulator and humanoid robotics nor do they fulfill the real-time requirement of our domain. As an alternative, we propose to break the generic skill learning problem into parts that we can understand well from a robotics point of view. After designing appropriate learning approaches for these basic components, these will serve as the ingredients of a general approach to motor skill learning. In this paper, we discuss our recent and current progress in this direction. For doing so, we present our work on learning to control, on learning elementary movements as well as our steps towards learning of complex tasks. We show several evaluations both using real robots as well as physically realistic simulations.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/peters_ISRR_2007.pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Seeger, M.; Peters, J.
Year2009
TitleLocal Gaussian Process Regression for Real Time Online Model Learning and Control
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 22 (NIPS/NeurIPS), Cambridge, MA: MIT Press
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/3403-local-gaussian-process-regression-for-real-time-online-model-learning.pdf
Reference TypeConference Proceedings
Author(s)Neumann, G.; Peters, J.
Year2009
TitleFitted Q-iteration by Advantage Weighted Regression
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 22 (NIPS/NeurIPS), Cambridge, MA: MIT Press
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NIPS2008-Neumann_5520%5B0%5D.pdf
Reference TypeJournal Article
Author(s)Hachiya, H.; Akiyama, T.; Sugiyama, M.; Peters, J.
Year2009
TitleAdaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning
Journal/Conference/Book TitleNeural Networks
Keywordsoff-policy reinforcement learning; value function approximation; policy iteration; adaptive importance sampling; importance-weighted cross-validation; efficient sample reuse
AbstractOff-policy reinforcement learning is aimed at efficiently using data samples gathered from a different policy than the currently optimized one. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and, therefore, their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a variant of cross-validation. We demonstrate the usefulness of the proposed approach through simulations.
Volume22
Number10
Pages1399-1410
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/hachiya-AdaptiveImportanceSampling_5530.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Peters, J.
Year2009
TitlePolicy Search for Motor Primitives in Robotics
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 22 (NIPS/NeurIPS), Cambridge, MA: MIT Press
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NIPS2008-Kober-Peters_5411[0].pdf
Reference TypeConference Proceedings
Author(s)Chiappa, S.; Kober, J.; Peters, J.
Year2009
TitleUsing Bayesian Dynamical Systems for Motion Template Libraries
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems 22 (NIPS/NeurIPS), Cambridge, MA: MIT Press
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NIPS2008-Chiappa_5400[0].pdf
Reference TypeJournal Article
Author(s)Deisenroth, M.P.; Rasmussen, C.E.; Peters, J.
Year2009
TitleGaussian Process Dynamic Programming
Journal/Conference/Book TitleNeurocomputing
Number72
Pages1508-1524
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Neurocomputing-2009-Deisenroth-Preprint_5531.pdf
Reference TypeConference Proceedings
Author(s)Hoffman, M.; de Freitas, N. ; Doucet, A.; Peters, J.
Year2009
TitleAn Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward
Journal/Conference/Book TitleProceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AIStats)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/AIStats2009-Hoffman_5658.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Kober, J.
Year2009
TitleUsing Reward-Weighted Imitation for Robot Reinforcement Learning
Journal/Conference/Book TitleProceedings of the 2009 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/peters_ADPRL_2009.pdf
Reference TypeConference Proceedings
Author(s)Hachiya, H.; Akiyama, T.; Sugiyama, M.; Peters, J.
Year2009
TitleEfficient Data Reuse in Value Function Approximation
Journal/Conference/Book TitleProceedings of the 2009 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ADPRL2009-Hachiya.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Peters, J.
Year2009
TitleLearning Motor Primitives for Robotics
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/ICRA2009-Kober_5661[0].pdf
Reference TypeConference Proceedings
Author(s)Piater, J.; Jodogne, S.; Detry, R.; Kraft, D.; Krueger, N.; Kroemer, O.; Peters, J.
Year2009
TitleLearning Visual Representations for Interactive Systems
Journal/Conference/Book TitleProceedings of the International Symposium on Robotics Research (ISRR), Invited Paper
AbstractWe describe two quite different methods for associating action parameters to visual percepts. Our RLVC algorithm performs reinforcement learning directly on the visual input space. To make this very large space manageable, RLVC interleaves the reinforcement learner with a supervised classification algorithm that seeks to split perceptual states so as to reduce perceptual aliasing. This results in an adaptive discretization of the perceptual space based on the presence or absence of visual features. Its extension RLJC also handles continuous action spaces. In contrast to the minimalistic visual representations produced by RLVC and RLJC, our second method learns structural object models for robust object detection and pose estimation by probabilistic inference. To these models, the method associates grasp experiences autonomously learned by trial and error. These experiences form a non-parametric representation of grasp success likelihoods over gripper poses, which we call a grasp density. Thus, object detection in a novel scene simultaneously produces suitable grasping options.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Piater-2009-ISRR.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Peters, J.
Year2009
TitleLearning new basic Movements for Robotics
Journal/Conference/Book TitleProceedings of Autonome Mobile Systeme (AMS 2009)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/paper_16.pdf
Reference TypeConference Proceedings
Author(s)Muelling, K.; Peters, J.
Year2009
TitleA computational model of human table tennis for robot application
Journal/Conference/Book TitleProceedings of Autonome Mobile Systeme (AMS 2009)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/A_computational_model_of_human_table_tennis.pdf
Reference TypeConference Proceedings
Author(s)Kroemer, O.; Detry, R.; Piater, J.; Peters, J.
Year2009
TitleActive Learning Using Mean Shift Optimization for Robot Grasping
Journal/Conference/Book TitleProceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/kroemer_IROS_2009.pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Seeger, M.; Peters, J.
Year2009
TitleSparse Online Model Learning for Robot Control with Support Vector Regression
Journal/Conference/Book TitleProceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2009)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Sparse_Online_Model_Learning_for_Robot_Control.pdf
Reference TypeJournal Article
Author(s)Peters, J.; Ng, A.
Year2009
TitleGuest Editorial: Special Issue on Robot Learning, Part B
Journal/Conference/Book TitleAutonomous Robots (AURO)
Volume27
Number2
Tertiary Author91-92
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Peters-Ng2009_Article_GuestEditorialSpecialIssueOnRo.pdf
Reference TypeConference Proceedings
Author(s)Sigaud, O.; Peters, J.
Year2009
TitleFrom Motor Learning to Interaction Learning in Robots
Journal/Conference/Book TitleProceedings of Journees Nationales de la Recherche en Robotique
Tertiary Author189-195
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/JNRR2009-Sigaud_[0].pdf
Reference TypeConference Proceedings
Author(s)Neumann, G.; Maass, W; Peters, J.
Year2009
TitleLearning Complex Motions by Sequencing Simpler Motion Templates
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML2009)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICML2009-Neumann.pdf
Reference TypeConference Proceedings
Author(s)Detry, R; Baseski, E.; Popovic, M.; Touati, Y.; Krueger, N; Kroemer, O.; Peters, J.; Piater, J;
Year2009
TitleLearning Object-specific Grasp Affordance Densities
Journal/Conference/Book TitleProceedings of the International Conference on Development & Learning (ICDL 2009)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/ICDL2009-Detry_[0].pdf
Reference TypeJournal Article
Author(s)Nguyen Tuong, D.; Seeger, M.; Peters, J.
Year2009
TitleModel Learning with Local Gaussian Process Regression
Journal/Conference/Book TitleAdvanced Robotics
Volume23
Number15
Pages2015-2034
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Nguyen-Tuong-ModelLearningLocalGaussian.pdf
Reference TypeJournal Article
Author(s)Kober, J.; Peters, J.
Year2009
TitleReinforcement Learning fuer Motor-Primitive
Journal/Conference/Book TitleKuenstliche Intelligenz
Link to PDFhttp://www.kuenstliche-intelligenz.de/index.php?id=7779&tx_ki_pi1[showUid]=1820&cHash=a9015a9e57
Reference TypeJournal Article
Author(s)Peters, J.; Morimoto, J.; Tedrake, R.; Roy, N.
Year2009
TitleRobot Learning
Journal/Conference/Book TitleIEEE Robotics & Automation Magazine
Keywordsrobot learning, tc spotlight
Volume16
Number3
Pages19-20
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/05233410.pdf
Reference TypeJournal Article
Author(s)Peters, J.; Ng, A.
Year2009
TitleGuest Editorial: Special Issue on Robot Learning, Part A
Journal/Conference/Book TitleAutonomous Robots (AURO)
Volume27
Number1
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Peters-Ng2009_Article_GuestEditorialSpecialIssueOnRo.pdf
Reference TypeConference Proceedings
Author(s)Lampert, C.H.; Peters, J.
Year2009
TitleActive Structured Learning for High-Speed Object Detection
Journal/Conference/Book TitleProceedings of the DAGM (Pattern Recognition)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/DAGM2009-Lampert.pdf
Reference TypeConference Proceedings
Author(s)Gomez Rodriguez, M.; Kober, J.; Schoelkopf, B.
Year2009
TitleDenoising photographs using dark frames optimized by quadratic programming
Journal/Conference/Book TitleProceedings of the First IEEE International Conference on Computational Photography (ICCP 2009)
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/ICCP09-GomezRodriguez_5491[0].pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/ICCP09-GomezRodriguez_5491[0].pdf
Reference TypeConference Proceedings
Author(s)Deisenroth, M.P.; Peters, J.; Rasmussen, C.E.
Year2008
TitleApproximate Dynamic Programming with Gaussian Processes
Journal/Conference/Book TitleAmerican Control Conference
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Main/PublicationsByYear/deisenroth_ACC2008.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Seeger, M.
Year2008
TitleComputed Torque Control with Nonparametric Regressions Techniques
Journal/Conference/Book TitleAmerican Control Conference
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NguyenTuong_ACC2008.pdf
Reference TypeConference Proceedings
Author(s)Deisenroth, M.P.; Rasmussen, C.E.; Peters, J.
Year2008
TitleModel-Based Reinforcement Learning with Continuous States and Actions
Journal/Conference/Book TitleProceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)
Pages19-24
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/deisenroth_ESANN2008.pdf
Reference TypeJournal Article
Author(s)Steinke, F.; Hein, M.; Peters, J.; Schoelkopf, B
Year2008
TitleManifold-valued Thin-Plate Splines with Applications in Computer Graphics
Journal/Conference/Book TitleComputer Graphics Forum (Special Issue on Eurographics 2008)
Volume27
Number2
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Steinke_EGFinal-1049.pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Peters, J.
Year2008
TitleLearning Inverse Dynamics: a Comparison
Journal/Conference/Book TitleProceedings of the European Symposium on Artificial Neural Networks (ESANN 2008)
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NguyenTuong_ACC2008.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Nguyen-Tuong, D.
Year2008
TitleReal-Time Learning of Resolved Velocity Control on a Mitsubishi PA-10
Journal/Conference/Book TitleInternational Conference on Robotics and Automation (ICRA)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICRA2008-Peters_4865[0].pdf
Reference TypeConference Proceedings
Author(s)Hachiya, H.; Akiyama, T.; Sugiyama, M.; Peters, J.
Year2008
TitleAdaptive Importance Sampling with Automatic Model Selection in Value Function Approximation
Journal/Conference/Book TitleProceedings of the Twenty-Third National Conference on Artificial Intelligence (AAAI 2008)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/AAAI-2008-Hachiya_5096[0].pdf
Reference TypeConference Proceedings
Author(s)Wierstra, D.; Schaul, T.; Peters, J.; Schmidhuber, J.
Year2008
TitleNatural Evolution Strategies
Journal/Conference/Book Title2008 IEEE Congress on Evolutionary Computation
AbstractThis paper presents Natural Evolution Strategies (NES), a novel algorithm for performing real-valued black box function optimization: optimizing an unknown objective function where algorithm-selected function measurements con- stitute the only information accessible to the method. Natu- ral Evolution Strategies search the fitness landscape using a multivariate normal distribution with a self-adapting mutation matrix to generate correlated mutations in promising regions. NES shares this property with Covariance Matrix Adaption (CMA), an Evolution Strategy (ES) which has been shown to perform well on a variety of high-precision optimization tasks. The Natural Evolution Strategies algorithm, however, is simpler, less ad-hoc and more principled. Self-adaptation of the mutation matrix is derived using a Monte Carlo estimate of the natural gradient towards better expected fitness. By following the natural gradient instead of the �vanilla� gradient, we can ensure efficient update steps while preventing early convergence due to overly greedy updates, resulting in reduced sensitivity to local suboptima. We show NES has competitive performance with CMA on several tasks, while outperforming it on one task that is rich in deceptive local optima, the Rastrigin benchmark. found and the algorithm�s sensitivity to local suboptima on the fitness landscape.
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/wierstra-CEC2008.pdf
Reference TypeConference Proceedings
Author(s)Nguyen Tuong, D.; Peters, J.
Year2008
TitleLocal Gaussian Processes Regression for Real-time Model-based Robot Control
Journal/Conference/Book TitleInternational Conference on Intelligent Robot Systems (IROS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2008-Nguyen_[0].pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Mohler, B.; Peters, J.
Year2008
TitleLearning Perceptual Coupling for Motor Primitives
Journal/Conference/Book TitleInternational Conference on Intelligent Robot Systems (IROS)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2008-Kober_5414[0].pdf
Reference TypeBook
Author(s)Lesperance, Y.; Lakemeyer, G.; Peters, J.; Pirri, F.
Year2008
TitleProceedings of the 6th International Cognitive Robotics Workshop (CogRob 2008)
Journal/Conference/Book TitleJuly 21-22, 2008, Patras, Greece, ISBN 978-960-6843-09-9
Reference TypeConference Proceedings
Author(s)Wierstra, D.; Schaul, T.; Peters, J.; Schmidthuber, J.
Year2008
TitleFitness Expectation Maximization
Journal/Conference/Book Title10th International Conference on Parallel Problem Solving from Nature (PPSN 2008)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ppsn08.pdf
Reference TypeJournal Article
Author(s)Nakanishi, J.;Cory, R.;Mistry, M.;Peters, J.;Schaal, S.
Year2008
TitleOperational space control: A theoretical and empirical comparison
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Keywordstask space control, operational space control, redundancy resolution, humanoid robotics
AbstractDexterous manipulation with a highly redundant movement system is one of the hallmarks of hu- man motor skills. From numerous behavioral studies, there is strong evidence that humans employ compliant task space control, i.e., they focus control only on task variables while keeping redundant degrees-of-freedom as compliant as possible. This strategy is robust towards unknown disturbances and simultaneously safe for the operator and the environment. The theory of operational space con- trol in robotics aims to achieve similar performance properties. However, despite various compelling theoretical lines of research, advanced operational space control is hardly found in actual robotics imple- mentations, in particular new kinds of robots like humanoids and service robots, which would strongly profit from compliant dexterous manipulation. To analyze the pros and cons of different approaches to operational space control, this paper focuses on a theoretical and empirical evaluation of different methods that have been suggested in the literature, but also some new variants of operational space controllers. We address formulations at the velocity, acceleration and force levels. First, we formulate all controllers in a common notational framework, including quaternion-based orientation control, and discuss some of their theoretical properties. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm with several benchmark tasks. As an aside, we also introduce a novel parameter estimation algorithm for rigid body dynamics, which ensures physical consistency, as this issue was crucial for our successful robot implementations. Our extensive empirical results demonstrate that one of the simplified acceleration-based approaches can be advantageous in terms of task performance, ease of parameter tuning, and general robustness and compliance in face of inevitable modeling errors.
Volume27
Number6
Pages737-757
Short TitleOperational space control: A theoretical and emprical comparison
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Int-J-Robot-Res-2008-27-737_5027[0].pdf
Reference TypeConference Proceedings
Author(s)Wierstra, D.; Schaul,T.; Peters, J.; Schmidhuber, J.
Year2008
TitleEpisodic Reinforcement Learning by Logistic Reward-Weighted Regression
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Neural Networks (ICANN)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/wierstra_ICANN08.pdf
Reference TypeConference Proceedings
Author(s)Sehnke, F.; Osendorfer, C; Rueckstiess, T; Graves, A.; Peters, J.; Schmidhuber, J.
Year2008
TitlePolicy Gradients with Parameter-based Exploration for Control
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Neural Networks (ICANN)
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/icann2008sehnke.pdf
Reference TypeBook
Author(s)Peters, J.
Year2008
TitleMachine Learning for Robotics
Journal/Conference/Book TitleVDM-Verlag, ISBN 978-3-639-02110-3
ISBN/ISSNISBN 978-3-639-02110-3
Link to PDFhttp://www.amazon.de/Machine-Learning-Robotics-Methods-Skills/dp/363902110X/ref=sr_1_1?ie=UTF8&s=books&qid=1220658804&sr=8-1
Reference TypeConference Proceedings
Author(s)Peters, J.; Kober, J.; Nguyen-Tuong, D.
Year2008
TitlePolicy Learning - a unified perspective with applications in robotics
Journal/Conference/Book TitleProceedings of the European Workshop on Reinforcement Learning (EWRL)
Keywordsreinforcement learning, policy gradient, weighted regression
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/8808e934beb11e344433a6c98a68269e26f1.pdf
Reference TypeConference Proceedings
Author(s)Kober, J.; Peters, J.
Year2008
TitleReinforcement Learning of Perceptual Coupling for Motor Primitives
Journal/Conference/Book TitleProceedings of the European Workshop on Reinforcement Learning (EWRL)
Reference TypeJournal Article
Author(s)Peters, J.
Year2008
TitleMachine Learning for Motor Skills in Robotics
Journal/Conference/Book TitleKuenstliche Intelligenz
Keywordsmotor control, motor primitives, motor learning
AbstractAutonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and the cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks of future robots. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator and humanoid robotics and usually scaling was only achieved in precisely pre-structured domains. We have investigated the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.
Number3
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/KuenstlicheIntelligenz-2008-Peters_[0].pdf
Reference TypeConference Paper
Author(s)Nguyen Tuong, D.; Peters, J.; Seeger, M.; Schoelkopf, B.
Year2008
TitleLearning Robot Dynamics for Computed Torque Control using Local Gaussian Processes Regression
Journal/Conference/Book TitleProceedings of the ECSIS Symposium on Learning and Adaptive Behavior in Robotic Systems, LAB-RS 2008
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/nguyen-ecsis.pdf
Reference TypeJournal Article
Author(s)Peters, J.; Schaal, S.
Year2008
TitleNatural actor critic
Journal/Conference/Book TitleNeurocomputing
Keywordsreinforcement learning, policy gradient, natural actor-critic, natural gradients
AbstractIn this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients em- ploying AmariÕs natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by lin- ear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gra- dients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and BradtkeÕs Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.
Volume71
Number7-9
Pages1180-1190
Short TitleNatural actor critic
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NEUCOM-D-07-00618-1_[0].pdf
Reference TypeJournal Article
Author(s)Peters, J.; Schaal, S.
Year2008
TitleLearning to control in operational space
Journal/Conference/Book TitleInternational Journal of Robotics Research (IJRR)
Keywordsoperational space control, learning, EM ALGORITHM, redundancy resolution, reinforcement learning
AbstractOne of the most general frameworks for phrasing control problems for complex, redundant robots is operational space control. However, while this framework is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in com- plex robots, e.g., humanoid robots. In this paper, we suggest a learning approach for opertional space control as a direct inverse model learning problem. A first important insight for this paper is that a physically cor- rect solution to the inverse problem with redundant degrees-of-freedom does exist when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on the insight that many operational space controllers can be understood in terms of a constrained optimal control problem. The cost function as- sociated with this optimal control problem allows us to formulate a learn- ing algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the machine learning point of view, this learning problem corre- sponds to a reinforcement learning problem that maximizes an immediate reward. We employ an expectation-maximization policy search algorithm in order to solve this problem. Evaluations on a three degrees of freedom robot arm are used to illustrate the suggested approach. The applica- tion to a physically realistic simulator of the anthropomorphic SARCOS Master arm demonstrates feasibility for complex high degree-of-freedom robots. We also show that the proposed method works in the setting of learning resolved motion rate control on real, physical Mitsubishi PA-10 medical robotics arm.
Volume27
Pages197-212
Short TitleLearning to control in operational space
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Team/JanPeters/Learning_to_Control_in_Operational_Space.pdf
Reference TypeJournal Article
Author(s)Peters, J.; Schaal, S.
Year2008
TitleReinforcement learning of motor skills with policy gradients
Journal/Conference/Book TitleNeural Networks
KeywordsReinforcement learning, Policy gradient methods, Natural gradients, Natural Actor-Critic, Motor skills, Motor primitives
AbstractAutonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
Volume21
Number4
Pages682-97
DateMay
Short TitleReinforcement learning of motor skills with policy gradients
ISBN/ISSN0893-6080 (Print)
Accession Number18482830
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Neural-Netw-2008-21-682_4867[0].pdf
AddressMax Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tubingen, Germany; University of Southern California, 3710 S. McClintoch Ave-RTH401, Los Angeles, CA 90089-2905, USA.
Languageeng
Reference TypeJournal Article
Author(s)Peters, J.;Mistry, M.;Udwadia, F. E.;Nakanishi, J.;Schaal, S.
Year2008
TitleA unifying framework for robot control with redundant DOFs
Journal/Conference/Book TitleAutonomous Robots (AURO)
Keywordsoperational space control, inverse control, dexterous manipulation, optimal control
AbstractRecently, Udwadia (Proc. R. Soc. Lond. A 2003:1783–1800, 2003) suggested to derive tracking controllers for mechanical systems with redundant degrees-of-freedom (DOFs) using a generalization of Gauss’ principle of least constraint. This method allows reformulating control problems as a special class of optimal controllers. In this paper, we take this line of reasoning one step further and demonstrate that several well-known and also novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sarcos Master Arm robot for some of the derived controllers. The suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equations, both with or without external constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.
Volume24
Number1
Pages1-12
Short TitleA unifying methodology for robot control with redundant DOFs
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/AR-2008final_[0].pdf
Reference TypeThesis
Author(s)Kober, J.
Year2008
TitleReinforcement Learning for Motor Primitives
Journal/Conference/Book TitleDipl-Ing Thesis, University of Stuttgart
URL(s) http://www.ias.informatik.tu-darmstadt.de/publications/DiplomaThesis-Kober_5331[0].pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/publications/DiplomaThesis-Kober_5331[0].pdf
Reference TypeJournal Article
Author(s)Peters, J.
Year2007
TitleComputational Intelligence: By Amit Konar
Journal/Conference/Book TitleThe Computer Journal
Keywordsbook review
Volume50
Number6
Pages758
Reference TypeConference Proceedings
Author(s)Peters, J.; Schaal, S.
Year2007
TitlePolicy Learning for Motor Skills
Journal/Conference/Book TitleProceedings of 14th International Conference on Neural Information Processing (ICONIP)
KeywordsMachine Learning, Reinforcement Learning, Robotics, Motor Primitives, Policy Gradients, Natural Actor-Critic, Reward-Weighted Regression
AbstractPolicy learning which allows autonomous robots to adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach policy learning with the goal of an application to motor skill refinement in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, we study policy learning algorithms which can be applied in the general setting of motor skill learning, and, secondly, we study a theoretically well-founded general approach to representing the required control structures for task representation and execution.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICONIP2007-Peters_4869[0].pdf
Reference TypeConference Proceedings
Author(s)Wierstra, D.; Foerster, A.; Peters, J.; Schmidhuber, J.
Year2007
TitleSolving Deep Memory POMDPs with Recurrent Policy Gradients
Journal/Conference/Book TitleProceedings of the International Conference on Artificial Neural Networks (ICANN)
Keywordspolicy gradients, reinforcement learning
AbstractThis paper presents Recurrent Policy Gradients, a model- free reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a �Long Short-Term Memory� architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/icann2007.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Schaal, S.; Schoelkopf, B.
Year2007
TitleTowards Machine Learning of Motor Skills
Journal/Conference/Book TitleProceedings of Autonome Mobile Systeme (AMS)
KeywordsMotor Skill Learning, Robotics, Natural Actor-Critic, Reward-Weighted Regeression
AbstractAutonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this paper, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two ma jor components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Peters_POAMS_2007.pdf
Reference TypeConference Proceedings
Author(s)Theodorou, E; Peters, J.; Schaal, S.
Year2007
TitleReinforcement Learning for Optimal Control of Arm Movements
Journal/Conference/Book TitleAbstracts of the 37st Meeting of the Society of Neuroscience
KeywordsOptimal Control,Reinforcement Learning, Arm Movements
AbstractEvery day motor behavior consists of a plethora of challenging motor skills from discrete movements such as reaching and throwing to rhythmic movements such as walking, drumming and running. How this plethora of motor skills can be learned remains an open question. In particular, is there any unifying computa-tional framework that could model the learning process of this variety of motor behaviors and at the same time be biologically plausible? In this work we aim to give an answer to these questions by providing a computational framework that unifies the learning mechanism of both rhythmic and discrete movements under optimization criteria, i.e., in a non-supervised trial-and-error fashion. Our suggested framework is based on Reinforcement Learning, which is mostly considered as too costly to be a plausible mechanism for learning com-plex limb movement. However, recent work on reinforcement learning with pol-icy gradients combined with parameterized movement primitives allows novel and more efficient algorithms. By using the representational power of such mo-tor primitives we show how rhythmic motor behaviors such as walking, squash-ing and drumming as well as discrete behaviors like reaching and grasping can be learned with biologically plausible algorithms. Using extensive simulations and by using different reward functions we provide results that support the hy-pothesis that Reinforcement Learning could be a viable candidate for motor learning of human motor behavior when other learning methods like supervised learning are not feasible.
Reference TypeJournal Article
Author(s)Nakanishi, J.; Mistry, M.; Peters, J.; Schaal, S.
Year2007
TitleExperimental evaluation of task space position/orientation control towards compliant control for humanoid robots
Journal/Conference/Book TitleIEEE International Conference on Intelligent Robotics Systems (IROS 2007)
Keywordsoperational space control, quaternion, task space control, resolved motion rate control, resolved acceleration, force control
AbstractCompliant control will be a prerequisite for humanoid robotics if these robots are supposed to work safely and robustly in human and/or dynamic environments. One view of compliant control is that a robot should control a minimal number of degrees-of-freedom (DOFs) directly, i.e., those relevant DOFs for the task, and keep the remaining DOFs maximally compliant, usually in the null space of the task. This view naturally leads to task space control. However, surprisingly few implementations of task space control can be found in actual humanoid robots. This paper makes a first step towards assessing the usefulness of task space controllers for humanoids by investigating which choices of controllers are available and what inherent control characteristics they have�this treatment will concern position and orientation control, where the latter is based on a quaternion formulation. Empirical evaluations on an anthropomorphic Sarcos master arm illustrate the robustness of the different controllers as well as the ease of implementing and tuning them. Our extensive empirical results demonstrate that simpler task space controllers, e.g., classical resolved motion rate control or resolved acceleration control can be quite advantageous in face of inevitable modeling errors in model-based control, and that well chosen formulations are easy to implement and quite robust, such that they are useful for humanoids.
Place PublishedSan Diego, CA: Oct. 29 � Nov. 2
Short TitleExperimental evaluation of task space position/orientation control towards compliant control for humanoid robots
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2007-Nakanishi_4722[0].pdf
Reference TypeThesis
Author(s)Peters, J.
Year2007
TitleMachine Learning of Motor Skills for Robotics
Journal/Conference/Book TitlePh.D. Thesis, Department of Computer Science, University of Southern California
KeywordsMachine Learning, Reinforcement Learning, Robotics, Motor Primitives, Policy Gradients, Natural Actor-Critic, Reward-Weighted Regression
AbstractAutonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can accomplish a multitude of different tasks, triggered by environmental context or higher level instruction. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning and human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains. In this thesis, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting. As a theoretical foundation, we first study a general framework to generate control laws for real robots with a particular focus on skills represented as dynamical systems in differential constraint form. We present a point-wise optimal control framework resulting from a generalization of Gauss' principle and show how various well-known robot control laws can be derived by modifying the metric of the employed cost function. The framework has been successfully applied to task space tracking control for holonomic systems for several different metrics on the anthropomorphic SARCOS Master Arm. In order to overcome the limiting requirement of accurate robot models, we first employ learning methods to find learning controllers for task space control. However, when learning to execute a redundant control problem, we face the general problem of the non-convexity of the solution space which can force the robot to steer into physically impossible configurations if supervised learning methods are employed without further consideration. This problem can be resolved using two major insights, i.e., the learning problem can be treated as locally convex and the cost function of the analytical framework can be used to ensure global consistency. Thus, we derive an immediate reinforcement learning algorithm from the expectation-maximization point of view which leads to a reward-weighted regression technique. This method can be used both for operational space control as well as general immediate reward reinforcement learning problems. We demonstrate the feasibility of the resulting framework on the problem of redundant end-effector tracking for both a simulated 3 degrees of freedom robot arm as well as for a simulated anthropomorphic SARCOS Master Arm. While learning to execute tasks in task space is an essential component to a general framework to motor skill learning, learning the actual task is of even higher importance, particularly as this issue is more frequently beyond the abilities of analytical approaches than execution. We focus on the learning of elemental tasks which can serve as the "building blocks of movement generation", called motor primitives. Motor primitives are parameterized task representations based on splines or nonlinear differential equations with desired attractor properties. While imitation learning of parameterized motor primitives is a relatively well-understood problem, the self-improvement by interaction of the system with the environment remains a challenging problem, tackled in the fourth chapter of this thesis. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm. In conclusion, in this thesis, we have contributed a general framework for analytically computing robot control laws which can be used for deriving various previous control approaches and serves as foundation as well as inspiration for our learning algorithms. We have introduced two classes of novel reinforcement learning methods, i.e., the Natural Actor-Critic and the Reward-Weighted Regression algorithm. These algorithms have been used in order to replace the analytical components of the theoretical framework by learned representations. Evaluations have been performed on both simulated and real robot arms.
Reference TypeConference Proceedings
Author(s)Peters, J.; Schaal, S.
Year2007
TitleReinforcement learning for operational space control
Journal/Conference/Book TitleInternational Conference on Robotics and Automation (ICRA2007)
Keywordsoperational space control, reinforcement learning, weighted regression, EM-Algorithm
AbstractWhile operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting supervised learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-convexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. The important insight that many operational space control algorithms can be reformulated as optimal control problems, however, allows addressing this inverse learning problem in the framework of reinforcement learning. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-based reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICRA2007-2111_[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Schaal, S.
Year2007
TitleUsing reward-weighted regression for reinforcement learning of task space control
Journal/Conference/Book TitleProceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
Keywordsreinforcement learning, cart-pole, policy gradient methods
AbstractIn this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.
Place PublishedHonolulu, Hawaii, April 1-5, 2007
Short TitleUsing reward-weighted regression for reinforcement learning of task space control
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ADPRL2007-Peters_[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Schaal, S.
Year2007
TitleApplying the episodic natural actor-critic architecture to motor primitive learning
Journal/Conference/Book TitleProceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN)
Keywordsreinforcement learning, policy gradient methods, motor primitives, natural actor-critic
AbstractIn this paper, we investigate motor primitive learning with the Natural Actor-Critic approach. The Natural Actor-Critic consists out of actor updates which are achieved using natural stochastic policy gradients while the critic obtains the natural policy gradient by linear regression. We show that this architecture can be used to learn the �building blocks of movement generation�, called motor primitives. Motor primitives are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. We show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
Place PublishedBruges, Belgium, April 25-27
Short TitleApplying the episodic natural actor-critic architecture to motor primitive learning
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/es2007-125.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Schaal, S.
Year2007
TitleReinforcement learning by reward-weighted regression for operational space control
Journal/Conference/Book TitleProceedings of the International Conference on Machine Learning (ICML2007)
Keywordsreinforcement learning, operational space control, weighted regression
AbstractMany robot control problems of practical importance, including operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots.
Place PublishedCorvallis, Oregon, June 19-21
Short TitleReinforcement learning by reward-weighted regression for operational space control
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ICML2007-Peters_4493[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Theodorou, E.;Schaal, S.
Year2007
TitlePolicy gradient methods for machine learning
Journal/Conference/Book TitleINFORMS Conference of the Applied Probability Society
Keywordspolicy gradient methods, reinforcement learning, simulation-optimization
AbstractWe present an in-depth survey of policy gradient methods as they are used in the machine learning community for optimizing parameterized, stochastic control policies in Markovian systems with respect to the expected reward. Despite having been developed separately in the reinforcement learning literature, policy gradient methods employ likelihood ratio gradient estimators as also suggested in the stochastic simulation optimization community. It is well-known that this approach to policy gradient estimation traditionally suffers from three drawbacks, i.e., large variance, a strong dependence on baseline functions and a inefficient gradient descent. In this talk, we will present a series of recent results which tackles each of these problems. The variance of the gradient estimation can be reduced significantly through recently introduced techniques such as optimal baselines, compatible function approximations and all-action gradients. However, as even the analytically obtainable policy gradients perform unnaturally slow, it required the step from ÔvanillaÕ policy gradient methods towards natural policy gradients in order to overcome the inefficiency of the gradient descent. This development resulted into the Natural Actor-Critic architecture which can be shown to be very efficient in application to motor primitive learning for robotics.
Place PublishedEindhoven, Netherlands, July 9-11, 2007
Short TitlePolicy gradient methods for machine learning
Reference TypeConference Proceedings
Author(s)Riedmiller, M.;Peters, J.;Schaal, S.
Year2007
TitleEvaluation of policy gradient methods and variants on the cart-pole benchmark
Journal/Conference/Book TitleProceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
Keywordsreinforcement learning, cart-pole, policy gradient methods
AbstractIn this paper, we evaluate different versions from the three main kinds of model-free policy gradient methods, i.e., finite difference gradients, `vanilla' policy gradients and natural policy gradients. Each of these methods is first presented in its simple form and subsequently refined and optimized. By carrying out numerous experiments on the cart pole regulator benchmark we aim to provide a useful baseline for future research on parameterized policy search algorithms. Portable C++ code is provided for both plant and algorithms; thus, the results in this paper can be reevaluated, reused and new algorithms can be inserted with ease.
Place PublishedHonolulu, Hawaii, April 1-5, 2007
Short TitleEvaluation of policy gradient methods and variants on the cart-pole benchmark
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/ADPRL2007-Peters2_[0].pdf
Reference TypeReport
Author(s)Peters, J.
Year2007
TitleRelative Entropy Policy Search
Journal/Conference/Book TitleCLMC Technical Report: TR-CLMC-2007-2, University of Southern California
Keywordsrelative entropy, policy search, natural policy gradient
AbstractThis technical report describes a cute idea of how to create new policy search approaches. It directly relates to the Natural Actor-Critic methods but allows the derivation of one shot solutions. Future work may include the application to interesting problems.
Place PublishedLos Angeles, CA
Type of WorkCLMC Technical Report
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Peters-TR2007.pdf
Research NotesA longer and more complete version is under preparation.
Reference TypeConference Proceedings
Author(s)Peters, J.;Schaal, S.
Year2006
TitleLearning operational space control
Journal/Conference/Book TitleRobotics: Science and Systems (RSS 2006)
Keywordsoperational space control redundancy forward models inverse models compliance reinforcement leanring locally weighted learning
AbstractWhile operational space control is of essential importance for robotics and well-understood from an analytical point of view, it can be prohibitively hard to achieve accurate control in face of modeling errors, which are inevitable in complex robots, e.g., humanoid robots. In such cases, learning control methods can offer an interesting alternative to analytical control algorithms. However, the resulting learning problem is ill-defined as it requires to learn an inverse mapping of a usually redundant system, which is well known to suffer from the property of non-covexity of the solution space, i.e., the learning system could generate motor commands that try to steer the robot into physically impossible configurations. A first important insight for this paper is that, nevertheless, a physically correct solution to the inverse problem does exits when learning of the inverse map is performed in a suitable piecewise linear way. The second crucial component for our work is based on a recent insight that many operational space controllers can be understood in terms of a constraint optimal control problem. The cost function associated with this optimal control problem allows us to formulate a learning algorithm that automatically synthesizes a globally consistent desired resolution of redundancy while learning the operational space controller. From the view of machine learning, the learning problem corresponds to a reinforcement learning problem that maximizes an immediate reward and that employs an expectation-maximization policy search algorithm. Evaluations on a three degrees of freedom robot arm illustrate the feasability of our suggested approach.
Place PublishedPhiladelphia, PA, Aug.16-19
PublisherCambridge, MA: MIT Press
Short TitleLearning operational space control
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/p33.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Schaal, S.
Year2006
TitleReinforcement Learning for Parameterized Motor Primitives
Journal/Conference/Book TitleProceedings of the 2006 International Joint Conference on Neural Networks (IJCNN)
Keywordsmotor primitives, reinforcement learning
AbstractOne of the major challenges in both action generation for robotics and in the understanding of human motor control is to learn the "building blocks of movement generation", called motor primitives. Motor primitives, as used in this paper, are parameterized control policies such as splines or nonlinear differential equations with desired attractor properties. While a lot of progress has been made in teaching parameterized motor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this paper, we evaluate different reinforcement learning approaches for improving the performance of parameterized motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and outline both established and novel algorithms for the gradient-based improvement of parameterized policies. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
Short TitleReinforcement Learning for Parameterized Motor Primitives
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Team/JanPeters/Reinforcement_Learning_for_Parameterized_Motor_Pri.pdf
Reference TypeConference Proceedings
Author(s)Ting, J.;Mistry, M.;Nakanishi, J.;Peters, J.;Schaal, S.
Year2006
TitleA Bayesian approach to nonlinear parameter identification for rigid body dynamics
Journal/Conference/Book TitleRobotics: Science and Systems (RSS 2006)
KeywordsBayesian regression linear models dimensionality reduction input noise rigid body dynamics parameter identification
AbstractFor robots of increasing complexity such as humanoid robots, conventional identification of rigid body dynamics models based on CAD data and actuator models becomes difficult and inaccurate due to the large number of additional nonlinear effects in these systems, e.g., stemming from stiff wires, hydraulic hoses, protective shells, skin, etc. Data driven parameter estimation offers an alternative model identification method, but it is often burdened by various other problems, such as significant noise in all measured or inferred variables of the robot. The danger of physically inconsistent results also exists due to unmodeled nonlinearities or insufficiently rich data. In this paper, we address all these problems by developing a Bayesian parameter identification method that can automatically detect noise in both input and output data for the regression algorithm that performs system identification. A post-processing step ensures physically consistent rigid body parameters by nonlinearly projecting the result of the Bayesian estimation onto constraints given by positive definite inertia matrices and the parallel axis theorem. We demonstrate on synthetic and actual robot data that our technique performs parameter identification with $10$ to $30%$ higher accuracy than traditional methods. Due to the resulting physically consistent parameters, our algorithm enables us to apply advanced control methods that algebraically require physical consistency on robotic platforms.
Place PublishedPhiladelphia, PA, Aug.16-19
PublisherCambridge, MA: MIT Press
Short TitleA Bayesian approach to nonlinear parameter identification for rigid body dynamics
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/p32.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Schaal, S.
Year2006
TitlePolicy gradient methods for robotics
Journal/Conference/Book TitleProceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS 2006)
Keywordspolicy gradient methods, reinforcement learning, robotics
AbstractThe aquisition and improvement of motor skills and control policies for robotics from trial and error is of essential importance if robots should ever leave precisely pre-structured environments. However, to date only few existing reinforcement learning methods have been scaled into the domains of highdimensional robots such as manipulator, legged or humanoid robots. Policy gradient methods remain one of the few exceptions and have found a variety of applications. Nevertheless, the application of such methods is not without peril if done in an uninformed manner. In this paper, we give an overview on learning with policy gradient methods for robotics with a strong focus on recent advances in the field. We outline previous applications to robotics and show how the most recently developed methods can significantly improve learning performance. Finally, we evaluate our most promising algorithm in the application of hitting a baseball with an anthropomorphic arm.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2006-Peters_[0].pdf
Reference TypeConference Proceedings
Author(s)Nakanishi, J.;Cory, R.;Mistry, M.;Peters, J.;Schaal, S.
Year2005
TitleComparative experiments on task space control with redundancy resolution
Journal/Conference/Book TitleIEEE International Conference on Intelligent Robots and Systems (IROS 2005)
Keywordsmanipulator dynamicsredundant manipulatorsspace optimizationdynamical decouplinghumanoid robotsinverse kinematicsmotor coordinationredundancy resolutionrobot dynamicsseven-degree-of-freedom anthropomorphic robot armtask space controlDynamical d
AbstractUnderstanding the principles of motor coordination with redundant degrees of freedom still remains a challenging problem, particularly for new research in highly redundant robots like humanoids. Even after more than a decade of research, task space control with redundacy resolution still remains an incompletely understood theoretical topic, and also lacks a larger body of thorough experimental investigation on complex robotic systems. This paper presents our first steps towards the development of a working redundancy resolution algorithm which is robust against modeling errors and unforeseen disturbances arising from contact forces. To gain a better understanding of the pros and cons of different approaches to redundancy resolution, we focus on a comparative empirical evaluation. First, we review several redundancy resolution schemes at the velocity, acceleration and torque levels presented in the literature in a common notational framework and also introduce some new variants of these previous approaches. Second, we present experimental comparisons of these approaches on a seven-degree-of-freedom anthropomorphic robot arm. Surprisingly, one of our simplest algorithms empirically demonstrates the best performance, despite, from a theoretical point, the algorithm does not share the same beauty as some of the other methods. Finally, we discuss practical properties of these control algorithms, particularly in light of inevitable modeling errors of the robot dynamics.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2005-Nakanishi_5051[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Vijayakumar, S.;Schaal, S.
Year2005
TitleNatural Actor-Critic
Journal/Conference/Book TitleProceedings of the 16th European Conference on Machine Learning (ECML 2005)
KeywordsReinforcement Learning, Policy Gradients, Natural Gradients
AbstractThis paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari�s natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regres- sion. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke�s Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Em- pirical evaluations illustrate the effectiveness of our techniques in com- parison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/NaturalActorCritic.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Mistry, M.;Udwadia, F. E.;Schaal, S.
Year2005
TitleA new methodology for robot control design
Journal/Conference/Book TitleThe 5th ASME International Conference on Multibody Systems, Nonlinear Dynamics, and Control (MSNDC 2005)
Keywordsrobot control, nonlinear control, gauss principle
AbstractGauss principle of least constraint and its generalizations have provided a useful insights for the development of tracking controllers for mechanical systems (Udwadia,2003). Using this concept, we present a novel methodology for the design of a specific class of robot controllers. With our new framework, we demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic framework, and show experimental verifications on a Sarcos Master Arm robot for some of these controllers. We believe that the suggested approach unifies and simplifies the design of optimal nonlinear control laws for robots obeying rigid body dynamics equations, both with or without external constraints, holonomic or nonholonomic constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/PetMisUdwSchASME2005_5054[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Mistry, M.;Udwadia, F. E.;Cory, R.;Nakanishi, J.;Schaal, S.
Year2005
TitleA unifying framework for the control of robotics systems
Journal/Conference/Book TitleIEEE International Conference on Intelligent Robots and Systems (IROS 2005)
AbstractRecently, [1] suggested to derive tracking controllers for mechanical systems using a generalization of Gauss principle of least constraint. This method al-lows us to reformulate control problems as a special class of optimal control. We take this line of reasoning one step further and demonstrate that well-known and also several novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sar-cos Master Arm robot for some of the the derived controllers.We believe that the suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equa-tions, both with or without external constraints, with over-actuation or under-actuation, as well as open-chain and closed-chain kinematics.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2005-Peters_5052[0].pdf
Reference TypeConference Proceedings
Author(s)Schaal, S.;Peters, J.;Nakanishi, J.;Ijspeert, A.
Year2004
TitleLearning Movement Primitives
Journal/Conference/Book TitleInternational Symposium on Robotics Research (ISRR2003)
Keywordsmovement primitives, supervised learning, reinforcment learning, locomotion, phase resetting, learning from demonstration
AbstractThis paper discusses a comprehensive framework for modular motor control based on a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic control policies. Model-based control theory is used to convert the outputs of these policies into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstration. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB, and learning biped walking from demonstration in simulation, including self-improvement of the movement patterns towards energy efficiency through resonance tuning.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/Schaal2005_Chapter_LearningMovementPrimitives.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Schaal, S.
Year2004
TitleLearning Motor Primitives with Reinforcement Learning
Journal/Conference/Book TitleProceedings of the 11th Joint Symposium on Neural Computation
Keywordsnatural policy gradients, motor primitives, natural actor-critic
AbstractOne of the major challenges in action generation for robotics and in the understanding of human motor control is to learn the "building blocks of move- ment generation," or more precisely, motor primitives. Recently, Ijspeert et al. [1, 2] suggested a novel framework how to use nonlinear dynamical systems as motor primitives. While a lot of progress has been made in teaching these mo- tor primitives using supervised or imitation learning, the self-improvement by interaction of the system with the environment remains a challenging problem. In this poster, we evaluate different reinforcement learning approaches can be used in order to improve the performance of motor primitives. For pursuing this goal, we highlight the difficulties with current reinforcement learning methods, and line out how these lead to a novel algorithm which is based on natural policy gradients [3]. We compare this algorithm to previous reinforcement learning algorithms in the context of dynamic motor primitive learning, and show that it outperforms these by at least an order of magnitude. We demonstrate the efficiency of the resulting reinforcement learning method for creating complex behaviors for automous robotics. The studied behaviors will include both discrete, finite tasks such as baseball swings, as well as complex rhythmic patterns as they occur in biped locomotion
Place Publishedhttp://resolver.caltech.edu/CaltechJSNC:2004.poster020
Reference TypeConference Proceedings
Author(s)Mohajerian, P.;Peters, J.;Ijspeert, A.;Schaal, S.
Year2003
TitleA unifying computational framework for optimization and dynamic systems approaches to motor control
Journal/Conference/Book TitleProceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003)
Keywordscomputational motor control, optimization, dynamic systems, formal modeling
AbstractTheories of biological motor control have been pursued from at least two separate frameworks, the "Dynamic Systems" approach and the "Control Theoretic/Optimization" approach. Control and optimization theory emphasize motor control based on organizational principles in terms of generic cost criteria like "minimum jerk", "minimum torque-change", "minimum variance", etc., while dynamic systems theory puts larger focus on principles of self-organization in motor control, like synchronization, phase-locking, phase transitions, perception-action coupling, etc. Computational formalizations in both approaches have equally differed, using mostly time-indexed desired trajectory plans in control/optimization theory, and nonlinear autonomous differential equations in dynamic systems theory. Due to these differences in philosophy and formalization, optimization approaches and dynamic systems approaches have largely remained two separate research approaches in motor control, mostly conceived of as incompatible. In this poster, we present a novel formal framework for motor control that can harmoniously encompass both optimization and dynamic systems approaches. This framework is based on the discovery that almost arbitrary nonlinear autonomous differential equations can be acquired within a standard statistical (or neural network) learning framework without the need of tedious manual parameter tuning and the danger of entering unstable or chaotic regions of the differential equations. Both rhythmic (e.g., locomotion, swimming, etc.) and discrete (e.g., point-to-point reaching, grasping, etc.) movement can be modeled, either as single degree-of-freedom or multiple degree-of-freedom systems. Coupling parameters to the differential equations can create typical effects of self-organization in dynamic systems, while optimization approaches can be used numerically safely to improve the attractor landscape of the equations with respect to a given cost criterion, as demonstrated in modeling studies of several of the hall marks of dynamics systems and optimization theory. We believe that this novel computational framework will allow a first step towards unifying dynamic systems and optimization approaches to motor control, and provide a set of principled modeling tools to both communities.
Place PublishedIrvine, CA, May 2003
Short TitleA unifying computational framework for optimization and dynamic systemsapproaches to motor control
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/JSNC2003-Mohajerian_[0].pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Vijayakumar, S.;Schaal, S.
Year2003
TitleReinforcement learning for humanoid robotics
Journal/Conference/Book TitleIEEE-RAS International Conference on Humanoid Robots (Humanoids2003)
Keywordsreinforcement learning, policy gradients, movement primitives, behaviors, dynamic systems, humanoid robotics
AbstractReinforcement learning offers one of the most general framework to take traditional robotics towards true autonomy and versatility. However, applying reinforcement learning to high dimensional movement systems like humanoid robots remains an unsolved problem. In this paper, we discuss different approaches of reinforcement learning in terms of their applicability in humanoid robotics. Methods can be coarsely classified into three different categories, i.e., greedy methods, `vanilla' policy gradient methods, and natural gradient methods. We discuss that greedy methods are not likely to scale into the domain humanoid robotics as they are problematic when used with function approximation. `Vanilla' policy gradient methods on the other hand have been successfully applied on real-world robots including at least one humanoid robot. We demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. A derivation of the natural policy gradient is provided, proving that the average policy gradient of Kakade (2002) is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges to the nearest local minimum of the cost function with respect to the Fisher information metric under suitable conditions. The algorithm outperforms non-natural policy gradients by far in a cart-pole balancing evaluation, and for learning nonlinear dynamic motor primitives for humanoid robot control. It offers a promising route for the development of reinforcement learning for truly high dimensionally continuous state-action systems.
Place PublishedKarlsruhe, Germany, Sept.29-30
Short TitleReinforcement learning for humanoid robotics
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Team/JanPeters/peters-ICHR2003.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.;Vijayakumar, S.;Schaal, S.
Year2003
TitleScaling reinforcement learning paradigms for motor learning
Journal/Conference/Book TitleProceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003)
KeywordsReinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradient
AbstractReinforcement learning offers a general framework to explain reward related learning in artificial and biological motor control. However, current reinforcement learning methods rarely scale to high dimensional movement systems and mainly operate in discrete, low dimensional domains like game-playing, artificial toy problems, etc. This drawback makes them unsuitable for application to human or bio-mimetic motor control. In this poster, we look at promising approaches that can potentially scale and suggest a novel formulation of the actor-critic algorithm which takes steps towards alleviating the current shortcomings. We argue that methods based on greedy policies are not likely to scale into high-dimensional domains as they are problematic when used with function approximation � a must when dealing with continuous domains. We adopt the path of direct policy gradient based policy improvements since they avoid the problems of unstabilizing dynamics encountered in traditional value iteration based updates. While regular policy gradient methods have demonstrated promising results in the domain of humanoid notor control, we demonstrate that these methods can be significantly improved using the natural policy gradient instead of the regular policy gradient. Based on this, it is proved that Kakade�s �average natural policy gradient� is indeed the true natural gradient. A general algorithm for estimating the natural gradient, the Natural Actor-Critic algorithm, is introduced. This algorithm converges with probability one to the nearest local minimum in Riemannian space of the cost function. The algorithm outperforms nonnatural policy gradients by far in a cart-pole balancing evaluation, and offers a promising route for the development of reinforcement learning for truly high-dimensionally continuous state-action systems.
URL(s) https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/petersVijayakumarSchaal_JSNC2003_5058[0].pdf
Reference TypeConference Proceedings
Author(s)Schaal, S.;Peters, J.;Nakanishi, J.;Ijspeert, A.
Year2003
TitleControl, planning, learning, and imitation with dynamic movement primitives
Journal/Conference/Book TitleWorkshop on Bilateral Paradigms on Humans and Humanoids, IEEE International Conference on Intelligent Robots and Systems (IROS 2003)
Keywordsmovement primitives, supervised learning, reinforcment learning, locomotion, phase resetting, learning from demonstration
AbstractIn both human and humanoid movement science, the topic of movement primitives has become central in understanding the generation of complex motion with high degree-of-freedom bodies. A theory of control, planning, learning, and imitation with movement primitives seems to be crucial in order to reduce the search space during motor learning and achieve a large level of autonomy and flexibility in dynamically changing environments. Movement recognition based on the same representations as used for movement generation, i.e., movement primitives, is equally intimately tied into these research questions. This paper discusses a comprehensive framework for motor control with movement primitives using a recently developed theory of dynamic movement primitives (DMP). DMPs are a formulation of movement primitives with autonomous nonlinear differential equations, whose time evolution creates smooth kinematic movement plans. Model-based control theory is used to convert such movement plans into motor commands. By means of coupling terms, on-line modifications can be incorporated into the time evolution of the differential equations, thus providing a rather flexible and reactive framework for motor planning and execution � indeed, DMPs form complete kinematic control policies, not just a particular desired trajectory. The linear parameterization of DMPs lends itself naturally to supervised learning from demonstrations. Moreover, the temporal, scale, and translation invariance of the differential equations with respect to these parameters provides a useful means for movement recognition. A novel reinforcement learning technique based on natural stochastic policy gradients allows a general approach of improving DMPs by trial and error learning with respect to almost arbitrary optimization criteria, including situations with delayed rewards. We demonstrate the different ingredients of the DMP approach in various examples, involving skill learning from demonstration on the humanoid robot DB and an application of learning simulated biped walking from a demonstrated trajectory, including self-improvement of the movement patterns in the spirit of energy efficiency through resonance tuning.
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Publications/IROS2003-Schaal_[0].pdf
Reference TypeConference Proceedings
Author(s)Vijayakumar, S.; D’Souza, A.; Peters, J.; Conradt, J.; Rutkowski,T.; Ijspeert, A.; Nakanishi, J.; Inoue, M.; Shibata, T.; Wiryo, A.; Itti, L.; Amari, S.; Schaal, S
Year2002
TitleReal-Time Statistical Learning for Oculomotor Control and Visuomotor Coordination
Journal/Conference/Book TitleAdvances in Neural Information Processing Systems (NIPS/NeurIPS), Demonstration Track
Reference TypeReport
Author(s)Peters, J.
Year2002
TitlePolicy Gradient Methods for Control Applications
Journal/Conference/Book TitleCLMC Technical Report: TR-CLMC-2007-1, University of Southern California
Link to PDFhttps://www.ias.informatik.tu-darmstadt.de/uploads/Member/JanPeters/techrep.pdf
Reference TypeConference Proceedings
Author(s)Burdet, E.; Tee, K.P.; Chew, C.M.; Peters, J.; Bt, V.L.
Year2001
TitleHybrid IDM/Impedance Learning in Human Movements
Journal/Conference/Book TitleFirst International Symposium on Measurement, Analysis and Modeling of Human Functions Proceedings
Keywordshuman motor control
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/burdet_ISHF_2001.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/burdet_ISHF_2001.pdf
Reference TypeConference Proceedings
Author(s)Peters, J.; Riener, R
Year2000
TitleA real-time model of the human knee for application in virtual orthopaedic trainer
Journal/Conference/Book TitleProceedings of the 10th International Conference on Biomedical Engineering Conference (ICBME)
KeywordsBiomechanics, human motor control
URL(s) http://www.ias.informatik.tu-darmstadt.de/uploads/Publications/peters_ICBME_2000.pdf
Link to PDFhttp://www.ias.informatik.tu-darmstadt.de/uploads/Publications/peters_ICBME_2000.pdf
Reference TypeJournal Article
Author(s)Peters, J.
Year1998
TitleFuzzy Logic for Practical Applications
Journal/Conference/Book TitleKuenstliche Intelligenz (KI)
Keywordsbook review
Number4
Pages60