João Carvalho
Quick Info
Research Interests
Reinforcement Learning, Robotics, Manipulation
More Information
Publications
Google Scholar
ORCID
Contact Information
Mail.
João Carvalho
TU Darmstadt, FG-IAS
Hochschulstr. 10, 64289 Darmstadt
Office. Room E325, Building S2|02
work+49-6151-16-25372
emailjoao.correia_carvalho@tu-darmstadt.de
emailjoao@robot-learning.de

João joined the Intelligent Autonomous Systems group as a PhD student in November 2019. He received a MSc degree in Computer Science from the
Albert-Ludwigs-Universität Freiburg, and previously completed a Master in Electrical and Computer Engineering from the
Instituto Superior Técnico of the University of Lisbon. His master thesis was written at IAS under the supervision of
Samuele Tosatto and explored an approach to obtain an off-policy gradient with better sample-efficiency.
Currently he is working within the IKIDA project to develop algorithms that enable robots to learn from human input.
To solve an assembly task robots require a particular set of skills, e.g. planning, vision and control. Even though there are good motion planners and computer vision methods, robots still lack the fine manipulation skills of humans, e.g. for insertion or placing tasks. Usually these manipulation skills have to be hard-coded by a specialized technician and are not learned by the robot. During his PhD, João is looking into new ways to teach low-level manipulation skills to robots through imitation and reinforcement learning. His research topics include variance reduction in policy gradients, exploration for robotics, residual policy learning and imitation learning.
Key References
Measure-Valued Derivatives
- Carvalho, J., Tateo, D., Muratore, F., Peters, J. (2021). An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients, International Joint Conference on Neural Networks (IJCNN).
Download Article [PDF] BibTeX Reference [BibTex]
- Carvalho, J.; Peters, J. (2022). An Analysis of Measure-Valued Derivatives for Policy Gradients, Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).
Download Article [PDF] BibTeX Reference [BibTex]
Offpolicy and Offline Reinforcement Learning
- Tosatto, S.; Carvalho, J.; Peters, J. (in press). Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).
Download Article [PDF] BibTeX Reference [BibTex]
- Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).
Download Article [PDF] BibTeX Reference [BibTex]
- Carvalho, J.A.C. (2019). Nonparametric Off-Policy Policy Gradient, Master Thesis.
Download Article [PDF] BibTeX Reference [BibTex]
Student Supervision
- MSc Thesis, Brosseit, J., Policy Optimization with Learning and Planning in Continuous Spaces
- MSc Thesis, Keller, L., Learning to Combine Local Controllers (w/ Dorothea Koert)
- MSc Thesis, Hilt, F., Statistical Model-Based Reinforcement Learning (w/ Joe Watson)
- MSc Thesis, Baierl, M., Score-Based Imitation Learning (w/ Julen Urain De Jesus)
- MSc Thesis, Hellwig, J., Reinforcement Learning in Stable Vector Fields (w/ Julen Urain De Jesus)
- MSc Thesis, Kappes, N., Smooth Exploration
- MSc Thesis, Herrmann, P., 6D Pose Estimation and Tracking for Ubongo 3D (w/ Suman Pal)
- MSc Thesis, Weiland, C., Model-Based Reinforcement Learning
- MSc Thesis, Xue, C., Task Classification and Local Manipulation Controllers (w/ Suman Pal)
- MSc Thesis, Zhao, P., Improving Gradient Directions for Episodic Policy Search
- MSc Thesis, Kaemmerer, M., Measure-Valued Derivatives for Machine Learning
- BSc Thesis, Daniv, M., Graph-Based Model Predictive Visual Imitation Learning (w/ Suman Pal)
- RL:IP.SS22, Daniv, M., 6D Pose Estimation and Tracking (w/ Suman Pal)
- RL:IP.WS21, Kappes, N., Herrmann, P., Trust Region Optimistic Actor Critic
- RL:IP.WS21, Hellwig, J., Baierl, M., A Hierarchical Approach to Active Pose Estimation (w/ Julen Urain De Jesus)
- RL:IP.SS21, Kappes, N., Herrmann, P., Second Order Extension of Optimistic Actor Critic
- RL:IP.SS21, Hellwig, J., Baierl, M., Active Visual Search with POMDPs (w/ Julen Urain De Jesus)
- RL:IP.SS21, Hilt, F., Kolf, J., Weiland, C., Graph Neural Networks for Robotic Manipulation
- RL:IP.WS20, Hilt, F., Kolf, J., Weiland, C., Balloon Estimators for Improving and Scaling NOPG (w/ Samuele Tosatto)
- RL:IP.WS20, Musekamp, D., Rettig, M., Learning Robot Skills From Video Data (w/ Dorothea Koert)
- BP.WS20, Derr, D., Nayyar, A., Cavkic, H., Kahnna, N., Vlacic, V., Hand Gesture Recognition for Robot Control (w/ Dorothea Koert)
- Research Internship, Ji Shi (ETH Zürich), 2022
Teaching Assistant
- Computational Engineering and Robotics (SS 2020, SS 2021)
- Robot Learning (WS 2020)
- Robot Learning Integrated Project (SS 2022)