João Carvalho


João joined the Intelligent Autonomous Systems group as a Ph.D. student in November 2019. He received a M.Sc. degree in Computer Science from the Albert-Ludwigs-Universität Freiburg, and previously completed a Master's degree in Electrical and Computer Engineering from the Instituto Superior Técnico of the University of Lisbon. His master thesis was written at IAS under the supervision of Samuele Tosatto and explored an approach to obtain an off-policy gradient with higher sample efficiency. Currently, he is working within the IKIDA project to develop algorithms that enable robots to work alongside humans.

During his Ph.D. João is developing learning algorithms for robotic manipulation. These include methods that leverage generative models for motion planning, reinforcement learning methods to solve contact-rich tasks like insertions, or improving policy gradient methods with variance reduction techniques.


Publications

Imitation Learning with Deep Generative Models

    •       Bib
      Funk, N.; Urain, J.; Carvalho, J.; Prasad, V.; Chalvatzaki, G.; Peters, J. (submitted). ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching.

Motion Planning with Diffusion Models

    •       Bib
      Carvalho, J.; Le, A. T.; Baierl, M.; Koert, D.; Peters, J. (2023). Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
    •     Bib
      Carvalho, J.; Baierl, M; Urain, J; Peters, J. (2022). Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation, NeurIPS 2022 Workshop on Score-Based Methods.

Reactive Motion Planning

    •     Bib
      Vorndamme, J.; Carvalho, J.; Laha, R.; Koert, D.; Figueredo, L.; Peters, J.; Haddadin, S. (2022). Integrated Bi-Manual Motion Generation and Control shaped for Probabilistic Movement Primitives, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).
--> Best Interactive Paper Award Finalist

Robot Learning for Contact-Rich Manipulation

    •     Bib
      Carvalho, J.; Koert, D.; Daniv, M.; Peters, J. (2022). Adapting Object-Centric Probabilistic Movement Primitives with Residual Reinforcement Learning, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).

Reinforcement Learning and Policy Gradients

    •     Bib
      Palenicek, D.; Lutter, M.; Carvalho, J.; Peters, J. (2023). Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning, International Conference on Learning Representations (ICLR).
    •     Bib
      Carvalho, J.; Peters, J. (2022). An Analysis of Measure-Valued Derivatives for Policy Gradients, Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).
    •     Bib
      Carvalho, J., Tateo, D., Muratore, F., Peters, J. (2021). An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients, International Joint Conference on Neural Networks (IJCNN).
    •     Bib
      Tosatto, S.; Carvalho, J.; Peters, J. (2022). Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 44, 10, pp.5996--6010.
    •     Bib
      Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).
    •     Bib
      Carvalho, J.A.C. (2019). Nonparametric Off-Policy Policy Gradient, Master Thesis.

Supervised Theses and Projects

Thesis/ProjectStudent(s)TopicTogether with
MSc ThesisStriebel, N.Bimanual Imitation Learning?Niklas Funk, Michael Drolet
MSc ThesisSun, Q.Geometry-Aware Diffusion Models for Robotics?An Thai Le
MSc ThesisKappes, N.Natural Gradient Optimistic Actor Critic 
MSc ThesisHilt, F.Statistical Model-Based Reinforcement LearningJoe Watson
MSc ThesisKeller, L.Context-Dependent Variable Impedance Controllers With Stability Guarantees Dorothea Koert
MSc ThesisHerrmann, P.6DCenterPose: Multi-object RGB-D 6D pose tracking with synthetic training dataSuman Pal
MSc ThesisBrosseit, J.The Principle of Value Equivalence for Policy Gradient Search
MSc ThesisBaierl, M.Score-Based Generative Models as Trajectory Priors for Motion PlanningJulen Urain De Jesus, An Thai Le
MSc ThesisHellwig, J.Residual Reinforcement Learning with Stable PriorsJulen Urain De Jesus
MSc ThesisXue, C.Task Classification and Local Manipulation ControllersSuman Pal
MSc ThesisZhao, P.Improving Gradient Directions for Episodic Policy Search 
MSc ThesisKaemmerer, M.Measure-Valued Derivatives for Machine Learning 
BSc ThesisDaniv, M.Graph-Based Model Predictive Visual Imitation LearningSuman Pal
    
RL:IP.SS24Striebel, N., Mulder, A.Reinforcement Learning of Insertion Tasks: A Comparison Between Policy Structures 
RL:IP.WS23Striebel, N., Mulder, A.Building a Framework to Solve Insertion Tasks with Residual Reinforcement Learning in the Real World 
RL:IP.SS23Meier, H.Model Based Multi-Object 6D Pose EstimationFelix Kaiser, Arjun Vir Datta
RL:IP.WS21Kappes, N., Herrmann, P.Trust Region Optimistic Actor Critic 
RL:IP.WS21Hellwig, J., Baierl, M.A Hierarchical Approach to Active Pose EstimationJulen Urain De Jesus
RL:IP.SS21Kappes, N., Herrmann, P.Second Order Extension of Optimistic Actor Critic 
RL:IP.SS21Hellwig, J., Baierl, M.Active Visual Search with POMDPsJulen Urain De Jesus
RL:IP.SS21Hilt, F., Kolf, J., Weiland, C.Graph Neural Networks for Robotic Manipulation 
RL:IP.WS20Hilt, F., Kolf, J., Weiland, C.Balloon Estimators for Improving and Scaling NOPGSamuele Tosatto
RL:IP.WS20Musekamp, D., Rettig, M.Learning Robot Skills From Video DataDorothea Koert
BP.WS20Derr, D., Nayyar, A., Cavkic, H., Kahnna, N., Vlacic, V.Hand Gesture Recognition for Robot ControlDorothea Koert
    
Research InternshipJi Shi (ETH Zürich)Rapid Adaptation for Contact Rich Tasks 

Teaching Assistant

LectureYears
Computational Engineering and RoboticsSS 2020, SS 2021
Robot LearningWS 2020
Robot Learning Integrated ProjectSS 2022