João Carvalho


João is a Postdoctoral Researcher at the Intelligent Autonomous Systems group. Previously, he obtained his Ph.D. degree from TU Darmstadt, a M.Sc. degree in Computer Science from the Albert-Ludwigs-Universität Freiburg, and completed a Master's degree in Electrical and Computer Engineering from the Instituto Superior Técnico of the University of Lisbon. His master's thesis was written under the supervision of Samuele Tosatto and explored an approach to obtain an off-policy gradient with higher sample efficiency. He has worked on several research projects, such as KoBo34 and IKIDA. His research interests are developing machine learning and reinforcement learning algorithms for robot manipulation. These include methods that leverage generative models for motion planning and grasping, reinforcement learning methods to solve contact-rich tasks like insertions, or improving policy gradient methods with variance reduction techniques.


Publications

  •       Bib
    Funk, N.; Urain, J.; Carvalho, J.; Prasad, V.; Chalvatzaki, G.; Peters, J. (submitted). ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching.
  •       Bib
    Carvalho, J.; Le, A.T.; Jahr, P. ; Sun, Q. ; Urain, J.; Koert, D.; Peters, J. (submitted). Grasp Diffusion Network: Learning Grasp Generators from Partial Point Clouds with Diffusion Models in SO(3)xR3, Submitted to the IEEE Robotics and Automation Letters (R-AL).
  •     Bib
    Palenicek, D.; Lutter, M.; Carvalho, J.; Dennert, D.; Ahmad, F.; Peters, J. (submitted). Diminishing Return of Value Expansion Methods, Submitted to the IEEE Transaction on Pattern Analysis and Machine Intelligence (PAMI).
  •       Bib
    Carvalho, J.; Le, A.T.; Kicki, P. ; Koert, D.; Peters, J. (submitted). Motion Planning Diffusion: Learning and Adapting Robot Motion Planning with Diffusion Models, Submitted to the IEEE Transactions on Robotics (T-Ro).
  •       Bib
    Le, A. T.; Nguyen, K.; Vu, M.N.; Carvalho, J.; Peters, J. (submitted). Model Tensor Planning, Transactions on Machine Learning Research (TMLR).
  •     Bib
    Le, A. T.; Hansel, K.; Carvalho, J.; Watson, J.; Urain, J.; Biess, A.; Chalvatzaki, G.; Peters, J. (2025). Global Tensor Motion Planning, IEEE Robotics and Automation Letters (RA-L), 10, 7, pp.7302-7309.
  •       Bib
    Carvalho, J.; Le, A.; Jahr, P. ; Sun, Q. ; Urain, J.; Koert, D.; Peters, J. (2025). Grasp Diffusion Network: Learning Grasp Generators from Partial Point Clouds with Diffusion Models in SO(3)×R3, German Robotics Conference (GRC).
  •       Bib
    Carvalho, J. (2025). Enhancing Robot Manipulation Skills through Learning, PhD Thesis.
  •   Bib
    Le, A. T.; Nguyen, K.; Vu, M.N.; Carvalho, J.; Peters, J. (2025). Model Tensor Planning, ICRA @ RoboARCH: Robotics Acceleration with Computing Hardware and Systems.
  •   Bib
    Le, A. T.; Hansel, K.; Carvalho, J.; Urain, J.; Biess, A.; Chalvatzaki, G.; Peters, J. (2024). Global Tensor Motion Planning, CoRL 2024 Workshop on Differentiable Optimization Everywhere.
  •       Bib
    Funk, N.; Urain, J.; Carvalho, J.; Prasad, V.; Chalvatzaki, G.; Peters, J. (2024). ACTIONFLOW: Equivariant, Accurate, and Efficient Manipulation Policies with Flow Matching, CoRL 2024 Workshop on Mastering Robot Manipulation in a World of Abundant Data.
  •     Bib
    Funk, N.; Urain, J.; Carvalho, J.; Prasad, V.; Chalvatzaki, G.; Peters, J. (2024). ActionFlow: Efficient, Accurate, and Fast Policies with Spatially Symmetric Flow Matching, R:SS workshop: Structural Priors as Inductive Biases for Learning Robot Dynamics.
  •     Bib
    Palenicek, D.; Lutter, M.; Carvalho, J.; Peters, J. (2023). Diminishing Return of Value Expansion Methods in Model-Based Reinforcement Learning, International Conference on Learning Representations (ICLR).
  •       Bib
    Carvalho, J.; Le, A. T.; Baierl, M.; Koert, D.; Peters, J. (2023). Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
  •     Bib
    Tosatto, S.; Carvalho, J.; Peters, J. (2022). Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 44, 10, pp.5996--6010.
  •     Bib
    Carvalho, J.; Peters, J. (2022). An Analysis of Measure-Valued Derivatives for Policy Gradients, Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM).
  •     Bib
    Carvalho, J.; Koert, D.; Daniv, M.; Peters, J. (2022). Adapting Object-Centric Probabilistic Movement Primitives with Residual Reinforcement Learning, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).
  •     Bib
    Vorndamme, J.; Carvalho, J.; Laha, R.; Koert, D.; Figueredo, L.; Peters, J.; Haddadin, S. (2022). Integrated Bi-Manual Motion Generation and Control shaped for Probabilistic Movement Primitives, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).
  •     Bib
    Carvalho, J.; Baierl, M; Urain, J; Peters, J. (2022). Conditioned Score-Based Models for Learning Collision-Free Trajectory Generation, NeurIPS 2022 Workshop on Score-Based Methods.
  •     Bib
    Carvalho, J., Tateo, D., Muratore, F., Peters, J. (2021). An Empirical Analysis of Measure-Valued Derivatives for Policy Gradients, International Joint Conference on Neural Networks (IJCNN).
  •     Bib
    Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS).
  •     Bib
    Carvalho, J.A.C. (2019). Nonparametric Off-Policy Policy Gradient, Master Thesis.

Supervised Theses and Projects

Thesis/ProjectStudent(s)TopicTogether with
MSc ThesisDierking, M.Model Tensor Planning?An Thai Le
MSc ThesisZoller, L.Generative models for motion planning leveraging control barrier functions?Kay Pompetzki
MSc ThesisJahr, P.Comparing Residual Reinforcement Learning Strategies With A Stable Vector Field Base Policy 
MSc ThesisStriebel, N.Bimanual Robotic Manipulation through Imitation with Deep Generative Models and Expressive RepresentationsNiklas Funk, Michael Drolet
MSc ThesisSun, Q.Grasp Diffusion NetworkAn Thai Le
MSc ThesisKappes, N.Natural Gradient Optimistic Actor Critic 
MSc ThesisHilt, F.Statistical Model-Based Reinforcement LearningJoe Watson
MSc ThesisKeller, L.Context-Dependent Variable Impedance Controllers With Stability Guarantees Dorothea Koert
MSc ThesisHerrmann, P.6DCenterPose: Multi-object RGB-D 6D pose tracking with synthetic training dataSuman Pal
MSc ThesisBrosseit, J.The Principle of Value Equivalence for Policy Gradient Search
MSc ThesisBaierl, M.Score-Based Generative Models as Trajectory Priors for Motion PlanningJulen Urain De Jesus, An Thai Le
MSc ThesisHellwig, J.Residual Reinforcement Learning with Stable PriorsJulen Urain De Jesus
MSc ThesisXue, C.Task Classification and Local Manipulation ControllersSuman Pal
MSc ThesisZhao, P.Improving Gradient Directions for Episodic Policy Search 
MSc ThesisKaemmerer, M.Measure-Valued Derivatives for Machine Learning 
BSc ThesisDaniv, M.Graph-Based Model Predictive Visual Imitation LearningSuman Pal
    
RL:IP.SS24Striebel, N., Mulder, A.Reinforcement Learning of Insertion Tasks: A Comparison Between Policy Structures 
RL:IP.WS23Striebel, N., Mulder, A.Building a Framework to Solve Insertion Tasks with Residual Reinforcement Learning in the Real World 
RL:IP.SS23Meier, H.Model Based Multi-Object 6D Pose EstimationFelix Kaiser, Arjun Vir Datta
RL:IP.WS21Kappes, N., Herrmann, P.Trust Region Optimistic Actor Critic 
RL:IP.WS21Hellwig, J., Baierl, M.A Hierarchical Approach to Active Pose EstimationJulen Urain De Jesus
RL:IP.SS21Kappes, N., Herrmann, P.Second Order Extension of Optimistic Actor Critic 
RL:IP.SS21Hellwig, J., Baierl, M.Active Visual Search with POMDPsJulen Urain De Jesus
RL:IP.SS21Hilt, F., Kolf, J., Weiland, C.Graph Neural Networks for Robotic Manipulation 
RL:IP.WS20Hilt, F., Kolf, J., Weiland, C.Balloon Estimators for Improving and Scaling NOPGSamuele Tosatto
RL:IP.WS20Musekamp, D., Rettig, M.Learning Robot Skills From Video DataDorothea Koert
BP.WS20Derr, D., Nayyar, A., Cavkic, H., Kahnna, N., Vlacic, V.Hand Gesture Recognition for Robot ControlDorothea Koert
    
Research InternshipJi Shi (ETH Zürich)Rapid Adaptation for Contact Rich Tasks 

Teaching Assistant

LectureYears
Computational Engineering and RoboticsSS 2020, SS 2021
Robot LearningWS 2020
Robot Learning Integrated ProjectSS 2022