Category Question Intent/Answer Checked by ...
1. Starters (Off-Topic) How about we start with you, telling us something about yourself? Get the applicant going by talking about the topic he/she is the most confident about: himself/herself. Note with which story he/she starts, i.e., what the applicant puts emphasis on. Also check to whom he/she is attributing prior success.
Do you have any particular topic you would like to study in your PhD? To figure out how well structured the applicant’s future plans are. Bad answer: "I want to use deep RL together with robots (then referring to fancy videos)". Good answer: "I want to figure out how to improve the exploration in RL, given this is a clear bottleneck of the current methods."
Could you briefly describe your master’s thesis to me? The applicant should be able to explain it in less than 2min in a way that another person can understand it, especially if that person has a technical background. If not, this is a very bad sign and you should dig deeper here. Figure out what the key challenges have been. Did he/she solve them alone?
POINT_OR_PROJECT from your resume caught my attention. Can you tell me more about it? [At least ask one subsequent detailed question] Let the applicant speak on a well-defined topic. Check if the POINT_OR_PROJECT was actually as interesting as you thought and which role the applicant played in it.
I saw in your CV that you obtained incredible grades in TOPIC? How was it? How many people were doing TOPIC, and how was your grade in relation to the others? The goal of this question is measuring the applicant’s self-esteem and to make objective a relative metric as the grade. If the applicant is the top student of 10 is not the same as being the top student of 1000.
If you could instantly learn any kind of skill, what would it be? Just chatting and getting a feeling for what the applicant is interested in.
What triggered your interest in robotics? Just chatting and getting a feeling for what the applicant is interested in.
Did you see that RECENT_STORY_ABOUT_ML_OR_ROBOTICS in the news? Test if the applicant follows the news on this topic. You can also turn this into an anecdote about something you worked on.
2. General How about we start with you, telling us something about yourself? Get the applicant going by talking about the topic he/she is the most confident about: himself/herself. Note with which story he/she starts, i.e., what the applicant puts emphasis on. Also check to whom he/she is attributing prior success.
Why did you choose to apply at IAS? This needs to be answered well by the applicant, since it is one of the most obvious questions. Vague answers like “Because of your interesting research.” need to be countered with (repeated) questions on which research and which paper in particular. The applicant should at least have read our webpage and found out that we work in the intersection of machine learning and robotics. Gauge the level of enthusiasm the applicant has.
Why should the IAS choose you? This needs to be answered well by the applicant, since it is one of the most obvious questions. Vague answers like “Because I work hard.” are not acceptable. Check if the applicant’s skills match.
What do you want to achieve during your PhD with us? This question is very open. It is ok if the applicant admits that he/she does not know. However, a convincing story is a plus. This question is more about that the person asked him-/herself this question before applying.
To which other places did you apply? If you could choose one of these places, which one would you choose (given that you can not choose IAS)? It's a sneaky way to find out what they are looking for in terms of labs and topics. (I did not apply anywhere else is typically a lie.)
What are your favorite papers? Is there a paper that sparked your interest recently? [At least ask one subsequent detailed question] The applicant should be able to name elicit preferences and/or researchers. Watch out for vague answers, which are typically a bad sign. Drill at least one level deep and ask what is interesting about the named papers and/or researchers.
What kind of supervision do you expect? Do you want to work in a team? Find out if the applicant is willing to work in a team and is able to take constructive criticism and react positively to it. The applicant should neither expect too heavy supervision, nor to work alone. Close to every applicant will say that he/she wants to work in a team; you need to take notice of how much he/she emphasizes it.
How do you see yourself in five years? Check out if the applicant has a long-term plan, e.g., PostDoc or entrepreneur.
What are the research questions that interest you most? These should be connectable to the position he/she is applying to.
What is the paper or project of which you are proud the most? What were your contributions to that? See what matters to the applicant. Typically this is his/her biggest merit, so it should be somewhat impressive. It is a very bad sign if the applicant claims that he/she did the vast majority (or even everything) alone. Figure out what his/her specialty is.
How do you handle pressure? Check if the applicant can give an example of how you have successfully handled stress in a previous job. The candidate should not claim that he/she never, or rarely, experiences stress since this is difficult to believe, or shows that the candidate only worked in low-pressure environments.
3. Basic Prior Knowledge What is an eigenvalue and an eigenvector of a matrix? An eigenvector is a vector such that if we apply to it the linear operator defined by the matrix, we obtain the vector itself scaled by its corresponding eigenvalue.
What is the relationship between eigenvalues, trace and determinant? The determinant is the product of the eigenvalues, and the trace is the sum of the eigenvalues.
How to check if two vectors are linearly independent? The dot product (a.k.a. scalar product) equals zero.
Do you know how to formalize an optimal control problem? The applicant should define an objective function, and (optionally) some constraints like transition dynamics, and then mention how to optimize.
What is the kernel of a matrix M? Do you know any applications where this concept is used? The kernel represents the null space of the linear map that is represented by the matrix, i.e., the space of vectors x for which Mx = 0. It is for example used to speed up constrained optimization by working only with a reduced set of variables in the null space of the constraint Jacobian. The kernel / null space is also used in robotics to trade off between multiple objectives. By projecting movements from one objective into the null space of another objective, one can ensure that the latter is always prioritized over the former.
Are you familiar with the concept of the Lagrangian multiplier in optimization? Using Lagrangian multipliers, we convert a constrained optimization problem into an unconstrained optimization problem by adding an additional cost in the objective multiplier by the Lagrangian multiplier.
Can you explain linear regression? Linear combination of features and parameters. The features can be nonlinear. Linear regression learns the parameters to minimize a loss (supervised learning).
When is a linear system d/dt x(t) = A x(t) unstable? If at least one of the eigenvalues of A has a real part (in general they are complex numbers) with value bigger than zero.
Which problem does calculus of variations address? The calculus of variations is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals. That is, we are not maximizing or minimizing a function w.r.t. an input variable.
What does the central limit theorem say? The sample mean is approximately normally distributed (in the limit of the sample size going to infinity).
Can you explain the Bellman equation? The Bellman equation is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. (more simple) Check if the applicant mentions the recursive definition of the next value and the max operator.
What is Model Predictive Control (MPC)? (Sampled-based) MPC is a control framework that selects the next optimal action to take, by adapting the search distribution over actions based on the cost of multiple finite-horizon rollouts, which are obtained using an internal model. Important points to mention for the applicant are the finite horizon and the usage of the (internal) model.
What is the integrator wind-up in PID control? Too large tracking error saturates the integrator part of the controller (often seen in nonlinear systems).
What is the optimal controller for a linear dynamical system with a quadratic cost function? Where does this controller originate from? The linear-quadratic regulator (LQR). It originates from solving the Riccati equations.
4. Robotics What are the challenging problems in robotics nowadays? Perception, planning, actuation, etc.. There are multiple valid answers here. You can follow up by asking to elaborate why (one of them) is challenging?
What are your prior experiences with robots? Check the applicant’s ability to work with real robots. How many robots did he/she get in contact with? How did he/she use them, i.e., just go-to commands or low-level stuff?
Why do we need robot learning? Multiple possible answers: (i) because programming is expensive, (ii) because we want generalization, (iii) to better understand how humans learn.
What is the benefit of task space control? What are its caveats? The benefit is that task space control allows you to reason about collisions with objects in the work space of the robot. One caveat of task space control is that it requires dealing with kinematic singularities that can lead to catastrophic failures if not addressed.
How does a Kalman filter work? The Kalman filter assumes a linear Gaussian dynamical system (i.e., linear state-space model). Control view: adaptive feedback observer. Bayesian view: Condition belief on each measurement, which is a linear update for Gaussian distributions.
What is motion planning? A computational problem to find a sequence of valid configurations that moves the object/robot from a source to a target destination. Solution can be search methods such as RRT, A*, etc., or optimization methods such as trajectory optimization, CHOMP, etc.
Do you know the differences between path planning and optimal control methods? Path planning solves an optimization problem in the state space and does not require dynamics. Optimal control takes into consideration the dynamics. Optimal control is usually faster to compute and then, it can be applied online. Path planning is usually too slow for direct control. In robotics, we usually combine both. First, path planning, then control.
What are the main reasons why we shouldn't apply Gaussian exploration to a robotic platform? In order of importance: (i) if the std is too little the robots will effectively not move because of the actuator's backlash, (ii) you can cause wear and tear of the actuator, (ii) it's not safe to use a Gaussian policy on a robot.
Can you explain what the Jacobian means in the context of robotics? What is the null space of the Jacobian? It is a mapping between velocities in two different coordinate systems (joint and task space). J = dx / dq. The Jacobian typically requires the pseudo-inverse (non-square if the two spaces have a different number of dimensions). The null space is the space of joint velocities that are projected to zero cartesian velocity by the Jacobian. It is also used to trade off between multiple objectives. By projecting movements from one objective into the null space of another objective, one can ensure that the latter is always prioritized over the former.
Given a robot with 7 revolute joints, how can you compute the necessary joint configuration to reach a desired pose in Cartesian space? Using inverse kinematics, i.e., by iteratively computing the Jacobian and thereby refiningthe joint positions until converging to the desired cartesian position. From this point, we can go further and ask questions like: Is there a guaranteed solution? How many joints do we need at least to achieve any 6 DoF pose,...
Why do we need representation learning in robotics? To define a low-dimensional state representation out of high-dimensional sensory input such as RGB, LIDAR, etc.
What is a rotation matrix in the context of robotics? How does it transform a vector from one coordinate system to another? Does it have any special properties? x_B = R_{BA} x_A. The columns of R are the basis vectors of B (new system) represented using the basis vectors of A (old system). Important properties: (i) det(R) = 1, (ii) all rotations are linear operations, (iii) rotations are not commutative, (iv) R^{-1} = R^T due to orthogonal basis.
5. Machine Learning Do you know what a probability density function is? (In loose words) A probability density function (PDF) distributes probability mass over a space. A PDF provides a (relative) likelihood that the corresponding random variable equals a given value. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to 1.
What is Bayes' theorem and how is it useful in machine learning? p(A | B) = p(B | A) p(A) / p(B). Let the applicant explain the terms “likelihood”, “prior”, and e”evidence”. Bayes’ theorem allows us to incorporate prior knowledge and sequentially update the belief. It is for example used in regression or classification.
What is Bayesian inference? Bayesian inference is a method of statistical inference, in which Bayes' theorem/rule is used to update the probability for a hypothesis as more evidence or information becomes available. (In loose words) Bayesian inference updates a belief, i.e., distributions over parameter values, given data and a generative model.
In variational inference, what does the ELBO stand for and what do the two terms represent? ELBO stands for evidence lower bound objective. One term is the expected log likelihood a.k.a. the 'accuracy' objective. The other term is the KL-divergence against the prior and acts as regularization. (For Bayesian linear regression the ELBO looks a lot like MSE with weight decay.)
What is the difference between Bayesian linear regression and (nonparametric) Gaussian Processes? Gaussian Processes are Bayesian linear regression in an infinite feature space, which is achieved by using a kernel function.
What is the optimization objective for the hyperparameters of a Gaussian Process? Maximization of the log marginal likelihood.
Do you know what normalizing flows are? Normalizing flows are a specific type of neural networks, used to learn a density by the combination of a latent simple density and a diffeomorphic network (special layers that simplify the calculation of the determinant of the networks Jacobian).
How many parameters do non-parametric models have? This question is intentionally ambiguous. It depends on the definition of a parameter: No parameters is hard to argue for. The typical answer is “as many as data points”, then each data point is considered a parameter. However, the employed kernels have parameters, too. For example, exponential kernels can be considered to have infinitely many parameters.
What is the difference between a variational autoencoder and an autoencoder? An autoencoder is used to learn a representation. A variational autoencoder is a generative model used to learn/approximate a target data distribution. Often the representation learned from variational autoencoders is used as features (e.g., disentangling). However, the two models are fundamentally different as the autoencoder can not do distribution matching.
What are the benefits of model-free reinforcement learning algorithms over model-based ones? Dynamics models can be wrong, and/or difficult to obtain. Model-based reinforcement learning generally suffers more from the exploration-exploitation trade-off.
What is in your opinion the current bottleneck of CHOOSE_ML_FIELD? Figure out if the applicant is aware of the problems in the desired field of research.
Can you explain the basic idea behind evolutionary algorithms? Start with a population of candidates, i.e., sets of parameters. Evaluate all of them, which yields a fitness value for each of them. Then select (the best) sets of parameters and create a new population by crossover and/or mutation.
Can you explain the concepts of aleatory and epistemic uncertainty? Aleatoric uncertainty is also known as statistical uncertainty, and is representative of unknowns that differ each time we run the same experiment. Epistemic uncertainty is also known as systematic uncertainty, and is due to things one could in principle know but does not in practice. This may be because a measurement is not accurate, because the model neglects certain effects, or because particular data have been deliberately hidden. Epistemic uncertainty is connected to the model and can be reduced by gathering more/better data.
Can you describe how a K-Nearest-Neighbor classifier works on an abstract level? (i) There is have labeled data (e.g. categories with values leading to a label), (ii) select a value for K (e.g. using cross-validation), (iii) compute the distances in category space (distance measure is sensitive to scaling of the units), (iv) sort according to the distance and considering the K nearest data points, (v) infer (e.g. select class by majority vote).
6. Specific Platforms or Software What's the difference between a ROS topic, service, and action? A ROS topic uses the paradigm publish-subscribe to share asynchronously data between different ROS nodes. A service instead, implements a client-server synchronous communication, i.e., the client will wait until the server responds. An action is also a client-server communication, but it is asynchronous, i.e., the server will signal the outcome of the action to the client when the action has been completed (or if a failure occurs).
What does the backslash operator in Matlab do? It divides matrices, i.e., left or right multiplies one matrix by the inverse of another. This is done by solving a system of equations.
Are you familiar with Linux and Git? You could double-check his/her claims from the CV by asking about some commands you know (e.g., mv, rsync, git fetch).
Have you used GPUs so far? When are they particularly useful? You could double-check his/her claims from the CV. GPUs are particularly useful when there is computation which can be parallelized.
Do you have experience with PyTorch, TensorFlow, JaX, or Flux.jl? What is the main benefit of these frameworks? The main benefit is their automatic differentiation, simplifying first order gradient-based optimization.
How to select every third entry of the 2nd dimension of 3-dimensional numpy array A? A[..., ::3, ...]
Is the list comprehension in Python faster than a for-loop? No, but it is more pythonic.
In PyTorch, which function of the optimizer instance should you call before doing the next iteration of gradient computation? optimizer.zero_grad()
7. Finishers After all the working, what do you do in your free time? Let the applicant talk about something that he/she enjoys to make him/her feel comfortable.
Do you have a particular hobby? Let the applicant talk about something that he/she enjoys to make him/her feel comfortable.
What place did you like the most to travel to? Don’t tell Joao if it is not Portugal.
Do you have any (remaining) questions? [Ask this question in ealy interviews] The applicant should have some questions prepared. Especially if this is his/her first interview! Note what the applicant is curious about.
Do you have any feddback for us on this interview? Maybe he/she has good ideas to improve our process. If the applicant was surprised that there are so many “technical questions”, ask them back what they expected from this interview.