João Carvalho

Quick Info

Research Interests

Reinforcement Learning, Robotics, Manipulation

More Information

Publications Google Scholar ORCID

Contact Information

Mail. João Carvalho
TU Darmstadt, FG-IAS
Hochschulstr. 10, 64289 Darmstadt
Office. Room E325, Building S2|02

João joined the Intelligent Autonomous Systems group as a PhD student in November 2019. He received a MSc degree in Computer Science from the Albert-Ludwigs-Universität Freiburg, and previously completed a Master in Electrical and Computer Engineering from the Instituto Superior Técnico of the University of Lisbon. His master thesis was written at IAS under the supervision of Samuele Tosatto and explored an approach to obtain an off-policy gradient with better sample-efficiency.

Currently he is working within the IKIDA project to develop algorithms that enable robots to learn from human input.

To solve an assembly task robots require a particular set of skills, e.g. planning, vision and control. Even though there are good motion planners and computer vision methods, robots still lack the fine manipulation skills of humans, e.g. for insertion or placing tasks. Usually these manipulation skills have to be hard-coded by a specialized technician and are not learned by the robot. During his PhD, João is looking into new ways to teach low-level manipulation skills to robots through imitation and reinforcement learning. His research topics include variance reduction in policy gradients, exploration for robotics, residual policy learning and imitation learning.

Key References

Measure-Valued Derivatives

  1. Carvalho, J., Tateo, D., Muratore, F., Peters, J. (2021). An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients, International Joint Conference on Neural Networks (IJCNN).   Download Article [PDF]   BibTeX Reference [BibTex]
  2. Carvalho, J.; Peters, J. (2022). An Analysis of Measure-Valued Derivatives for Policy Gradients, Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).   Download Article [PDF]   BibTeX Reference [BibTex]

Offpolicy and Offline Reinforcement Learning

  1. Tosatto, S.; Carvalho, J.; Peters, J. (in press). Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).   Download Article [PDF]   BibTeX Reference [BibTex]
  2. Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).   Download Article [PDF]   BibTeX Reference [BibTex]
  3. Carvalho, J.A.C. (2019). Nonparametric Off-Policy Policy Gradient, Master Thesis.   Download Article [PDF]   BibTeX Reference [BibTex]


zum Seitenanfang