João Carvalho

João joined the Intelligent Autonomous Systems group as a Ph.D. student in November 2019. He received a M.Sc. degree in Computer Science from the Albert-Ludwigs-Universität Freiburg, and previously completed a Master's degree in Electrical and Computer Engineering from the Instituto Superior Técnico of the University of Lisbon. His master thesis was written at IAS under the supervision of Samuele Tosatto and explored an approach to obtain an off-policy gradient with higher sample efficiency. Currently, he is working within the IKIDA project to develop algorithms that enable robots to work alongside humans.

During his Ph.D. João is developing learning algorithms for robotic manipulation. These include methods that leverage generative models for motion planning, reinforcement learning methods to solve contact-rich tasks like insertions, or improving policy gradient methods with variance reduction techniques.


Motion Planning with Diffusion Models

    •       Bib
      Carvalho, J.; Le, A. T.; Baierl, M.; Koert, D.; Peters, J. (2023). Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
    •     Bib
      Carvalho, J.; Baierl, M; Urain, J; Peters, J. (2022). Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation, NeurIPS 2022 Workshop on Score-Based Methods.

Reactive Motion Planning

    •     Bib
      Vorndamme, J.; Carvalho, J.; Laha, R.; Koert, D.; Figueredo, L.; Peters, J.; Haddadin, S. (2022). Integrated Bi-Manual Motion Generation and Control shaped for Probabilistic Movement Primitives, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).
--> Best Interactive Paper Award Finalist

Robot Learning for Contact-Rich Manipulation

    •     Bib
      Carvalho, J.; Koert, D.; Daniv, M.; Peters, J. (2022). Adapting Object-Centric Probabilistic Movement Primitives with Residual Reinforcement Learning, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).

Reinforcement Learning and Policy Gradients

    •     Bib
      Palenicek, D.; Lutter, M.; Carvalho, J.; Peters, J. (2023). Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning, International Conference on Learning Representations (ICLR).
    •     Bib
      Carvalho, J.; Peters, J. (2022). An Analysis of Measure-Valued Derivatives for Policy Gradients, Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).
    •     Bib
      Carvalho, J., Tateo, D., Muratore, F., Peters, J. (2021). An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients, International Joint Conference on Neural Networks (IJCNN).
    •     Bib
      Tosatto, S.; Carvalho, J.; Peters, J. (2022). Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 44, 10, pp.5996--6010.
    •     Bib
      Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).
    •     Bib
      Carvalho, J.A.C. (2019). Nonparametric Off-Policy Policy Gradient, Master Thesis.

Supervised Theses and Projects

Thesis/ProjectStudent(s)TopicTogether with
MSc ThesisKappes, N.Natural Gradient Optimistic Actor Critic 
MSc ThesisHilt, F.Statistical Model-Based Reinforcement LearningJoe Watson
MSc ThesisKeller, L.Context-Dependent Variable Impedance Controllers With Stability Guarantees Dorothea Koert
MSc ThesisHerrmann, P.6DCenterPose: Multi-object RGB-D 6D pose tracking with synthetic training dataSuman Pal
MSc ThesisBrosseit, J.The Principle of Value Equivalence for Policy Gradient Search
MSc ThesisBaierl, M.Score-Based Generative Models as Trajectory Priors for Motion PlanningJulen Urain De Jesus, An Thai Le
MSc ThesisHellwig, J.Residual Reinforcement Learning with Stable PriorsJulen Urain De Jesus
MSc ThesisXue, C.Task Classification and Local Manipulation ControllersSuman Pal
MSc ThesisZhao, P.Improving Gradient Directions for Episodic Policy Search 
MSc ThesisKaemmerer, M.Measure-Valued Derivatives for Machine Learning 
BSc ThesisDaniv, M.Graph-Based Model Predictive Visual Imitation LearningSuman Pal
RL:IP.WS23Striebel, N., Mulder, A.Building a Framework to Solve Insertion Tasks with Residual Reinforcement Learning in the Real World 
RL:IP.SS23Meier, H.6D Pose Estimation and Tracking ?Felix Kaiser, Arjun Vir Datta
RL:IP.WS21Kappes, N., Herrmann, P.Trust Region Optimistic Actor Critic 
RL:IP.WS21Hellwig, J., Baierl, M.A Hierarchical Approach to Active Pose EstimationJulen Urain De Jesus
RL:IP.SS21Kappes, N., Herrmann, P.Second Order Extension of Optimistic Actor Critic 
RL:IP.SS21Hellwig, J., Baierl, M.Active Visual Search with POMDPsJulen Urain De Jesus
RL:IP.SS21Hilt, F., Kolf, J., Weiland, C.Graph Neural Networks for Robotic Manipulation 
RL:IP.WS20Hilt, F., Kolf, J., Weiland, C.Balloon Estimators for Improving and Scaling NOPGSamuele Tosatto
RL:IP.WS20Musekamp, D., Rettig, M.Learning Robot Skills From Video DataDorothea Koert
BP.WS20Derr, D., Nayyar, A., Cavkic, H., Kahnna, N., Vlacic, V.Hand Gesture Recognition for Robot ControlDorothea Koert
Research InternshipJi Shi (ETH Zürich)Rapid Adaptation for Contact Rich Tasks 

Teaching Assistant

Computational Engineering and RoboticsSS 2020, SS 2021
Robot LearningWS 2020
Robot Learning Integrated ProjectSS 2022