João Carvalho

Quick Info

Research Interests

Reinforcement Learning, Robotics, Manipulation

More Information

Publications Google Scholar ORCID

Contact Information

Mail. João Carvalho
TU Darmstadt, FG-IAS
Hochschulstr. 10, 64289 Darmstadt
Office. Room E325, Building S2|02

João joined the Intelligent Autonomous Systems group as a PhD student in November 2019. He received a MSc degree in Computer Science from the Albert-Ludwigs-Universität Freiburg, and previously completed a Master in Electrical and Computer Engineering from the Instituto Superior Técnico of the University of Lisbon. His master thesis was written at IAS under the supervision of Samuele Tosatto and explored an approach to obtain an off-policy gradient with better sample-efficiency.

Currently he is working within the IKIDA project to develop algorithms that enable robots to learn from human input.

To solve an assembly task robots require a particular set of skills, e.g. planning, vision and control. Usually these manipulation skills have to be hard-coded by a specialized technician and are not learned by the agent. During his PhD, João is looking into new ways to teach low-level manipulation skills to robots through machine, imitation and reinforcement learning. His research topics include variance reduction in policy gradients, exploration for robotics, residual policy learning, imitation learning and motion planning.


Motion Planning

  1. Carvalho, J.; Baierl, M; Urain, J; Peters, J. (2022). Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation, NeurIPS 2022 Workshop on Score-Based Methods.   Download Article [PDF]   BibTeX Reference [BibTex]
  2. Vorndamme, J.;Carvalho, J.; Laha, Riddhiman; Koert, D.; Figueredo, L.; Peters, J.; Haddadin, S. (2022). Integrated Bi-Manual Motion Generation and Control shaped for Probabilistic Movement Primitives, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).   Download Article [PDF]   BibTeX Reference [BibTex]

Robot Reinforcement Learning

  1. Carvalho, J.; Koert, D.; Daniv, M.; Peters, J. (2022). Adapting Object-Centric Probabilistic Movement Primitives with Residual Reinforcement Learning, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).   Download Article [PDF]   BibTeX Reference [BibTex]

Measure-Valued Derivatives

  1. Carvalho, J., Tateo, D., Muratore, F., Peters, J. (2021). An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients, International Joint Conference on Neural Networks (IJCNN).   Download Article [PDF]   BibTeX Reference [BibTex]
  2. Carvalho, J.; Peters, J. (2022). An Analysis of Measure-Valued Derivatives for Policy Gradients, Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).   Download Article [PDF]   BibTeX Reference [BibTex]

Offpolicy and Offline Reinforcement Learning

  1. Tosatto, S.; Carvalho, J.; Peters, J. (in press). Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).   Download Article [PDF]   BibTeX Reference [BibTex]
  2. Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).   Download Article [PDF]   BibTeX Reference [BibTex]
  3. Carvalho, J.A.C. (2019). Nonparametric Off-Policy Policy Gradient, Master Thesis.   Download Article [PDF]   BibTeX Reference [BibTex]


zum Seitenanfang