Question List for Applicant 2179

The catalogue below includes example questions for helping you lead an interview as well as for all of us to cover the typical topics spread out over all lab members. To avoid that the same questions get asked many times, please make sure to check the ones you covered. Of course, it's wonderful if you go "off-script" and use your own questions.

CategoryQuestionIntent/AnswerChecked by ...
1. Starters (Off-Topic) Get the applicant going by talking about the topic he/she is the most confident about: himself/herself. Note with which story he/she starts, i.e., what the applicant puts emphasis on. Also check to whom he/she is attributing prior success. 
  To figure out how well structured the applicant’s future plans are. Bad answer: "I want to use deep RL together with robots (then referring to fancy videos)". Good answer: "I want to figure out how to improve the exploration in RL, given this is a clear bottleneck of the current methods." 
  The applicant should be able to explain it in less than 2min in a way that another person can understand it, especially if that person has a technical background. If not, this is a very bad sign and you should dig deeper here. Figure out what the key challenges have been. Did he/she solve them alone? 
  Let the applicant speak on a well-defined topic. Check if the POINT_OR_PROJECT was actually as interesting as you thought and which role the applicant played in it. 
  The goal of this question is measuring the applicant’s self-esteem and to make objective a relative metric as the grade. If the applicant is the top student of 10 is not the same as being the top student of 1000. 
  Just chatting and getting a feeling for what the applicant is interested in. 
  Just chatting and getting a feeling for what the applicant is interested in. 
  Test if the applicant follows the news on this topic. You can also turn this into an anecdote about something you worked on. 
    
2. General Get the applicant going by talking about the topic he/she is the most confident about: himself/herself. Note with which story he/she starts, i.e., what the applicant puts emphasis on. Also check to whom he/she is attributing prior success. 
  This needs to be answered well by the applicant, since it is one of the most obvious questions. Vague answers like “Because of your interesting research.” need to be countered with (repeated) questions on which research and which paper in particular. The applicant should at least have read our webpage and found out that we work in the intersection of machine learning and robotics. Gauge the level of enthusiasm the applicant has. 
  This needs to be answered well by the applicant, since it is one of the most obvious questions. Vague answers like “Because I work hard.” are not acceptable. Check if the applicant’s skills match. 
  This question is very open. It is ok if the applicant admits that he/she does not know. However, a convincing story is a plus. This question is more about that the person asked him-/herself this question before applying. 
  It's a sneaky way to find out what they are looking for in terms of labs and topics. (I did not apply anywhere else is typically a lie.) 
  The applicant should be able to name elicit preferences and/or researchers. Watch out for vague answers, which are typically a bad sign. Drill at least one level deep and ask what is interesting about the named papers and/or researchers. 
  Find out if the applicant is willing to work in a team and is able to take constructive criticism and react positively to it. The applicant should neither expect too heavy supervision, nor to work alone. Close to every applicant will say that he/she wants to work in a team; you need to take notice of how much he/she emphasizes it. 
  Check out if the applicant has a long-term plan, e.g., PostDoc or entrepreneur. 
  These should be connectable to the position he/she is applying to. 
  See what matters to the applicant. Typically this is his/her biggest merit, so it should be somewhat impressive. It is a very bad sign if the applicant claims that he/she did the vast majority (or even everything) alone. Figure out what his/her specialty is. 
  Check if the applicant can give an example of how you have successfully handled stress in a previous job. The candidate should not claim that he/she never, or rarely, experiences stress since this is difficult to believe, or shows that the candidate only worked in low-pressure environments. 
    
3. Basic Prior Knowledge An eigenvector is a vector such that if we apply to it the linear operator defined by the matrix, we obtain the vector itself scaled by its corresponding eigenvalue. 
  The determinant is the product of the eigenvalues, and the trace is the sum of the eigenvalues. 
  The dot product (a.k.a. scalar product) equals zero. 
  The applicant should define an objective function, and (optionally) some constraints like transition dynamics, and then mention how to optimize. 
  The kernel represents the null space of the linear map that is represented by the matrix, i.e., the space of vectors x for which Mx = 0. It is for example used to speed up constrained optimization by working only with a reduced set of variables in the null space of the constraint Jacobian. The kernel / null space is also used in robotics to trade off between multiple objectives. By projecting movements from one objective into the null space of another objective, one can ensure that the latter is always prioritized over the former. 
  Using Lagrangian multipliers, we convert a constrained optimization problem into an unconstrained optimization problem by adding an additional cost in the objective multiplier by the Lagrangian multiplier. 
  Linear combination of features and parameters. The features can be nonlinear. Linear regression learns the parameters to minimize a loss (supervised learning). 
  If at least one of the eigenvalues of A has a real part (in general they are complex numbers) with value bigger than zero. 
  The calculus of variations is a field of mathematical analysis that uses variations, which are small changes in functions and functionals, to find maxima and minima of functionals. That is, we are not maximizing or minimizing a function w.r.t. an input variable. 
  The sample mean is approximately normally distributed (in the limit of the sample size going to infinity). 
  The Bellman equation is a necessary condition for optimality associated with the mathematical optimization method known as dynamic programming. (more simple) Check if the applicant mentions the recursive definition of the next value and the max operator. 
  (Sampled-based) MPC is a control framework that selects the next optimal action to take, by adapting the search distribution over actions based on the cost of multiple finite-horizon rollouts, which are obtained using an internal model. Important points to mention for the applicant are the finite horizon and the usage of the (internal) model. 
  Too large tracking error saturates the integrator part of the controller (often seen in nonlinear systems). 
  The linear-quadratic regulator (LQR). It originates from solving the Riccati equations. 
    
4. Robotics Perception, planning, actuation, etc.. There are multiple valid answers here. You can follow up by asking to elaborate why (one of them) is challenging? 
  Check the applicant’s ability to work with real robots. How many robots did he/she get in contact with? How did he/she use them, i.e., just go-to commands or low-level stuff? 
  Multiple possible answers: (i) because programming is expensive, (ii) because we want generalization, (iii) to better understand how humans learn. 
  The benefit is that task space control allows you to reason about collisions with objects in the work space of the robot. One caveat of task space control is that it requires dealing with kinematic singularities that can lead to catastrophic failures if not addressed. 
  The Kalman filter assumes a linear Gaussian dynamical system (i.e., linear state-space model). Control view: adaptive feedback observer. Bayesian view: Condition belief on each measurement, which is a linear update for Gaussian distributions. 
  A computational problem to find a sequence of valid configurations that moves the object/robot from a source to a target destination. Solution can be search methods such as RRT, A*, etc., or optimization methods such as trajectory optimization, CHOMP, etc. 
  Path planning solves an optimization problem in the state space and does not require dynamics. Optimal control takes into consideration the dynamics. Optimal control is usually faster to compute and then, it can be applied online. Path planning is usually too slow for direct control. In robotics, we usually combine both. First, path planning, then control. 
  In order of importance: (i) if the std is too little the robots will effectively not move because of the actuator's backlash, (ii) you can cause wear and tear of the actuator, (ii) it's not safe to use a Gaussian policy on a robot. 
  It is a mapping between velocities in two different coordinate systems (joint and task space). J = dx / dq. The Jacobian typically requires the pseudo-inverse (non-square if the two spaces have a different number of dimensions). The null space is the space of joint velocities that are projected to zero cartesian velocity by the Jacobian. It is also used to trade off between multiple objectives. By projecting movements from one objective into the null space of another objective, one can ensure that the latter is always prioritized over the former. 
  Using inverse kinematics, i.e., by iteratively computing the Jacobian and thereby refiningthe joint positions until converging to the desired cartesian position. From this point, we can go further and ask questions like: Is there a guaranteed solution? How many joints do we need at least to achieve any 6 DoF pose,... 
  To define a low-dimensional state representation out of high-dimensional sensory input such as RGB, LIDAR, etc. 
  x_B = R_{BA} x_A. The columns of R are the basis vectors of B (new system) represented using the basis vectors of A (old system). Important properties: (i) det(R) = 1, (ii) all rotations are linear operations, (iii) rotations are not commutative, (iv) R^{-1} = R^T due to orthogonal basis. 
    
5. Machine Learning (In loose words) A probability density function (PDF) distributes probability mass over a space. A PDF provides a (relative) likelihood that the corresponding random variable equals a given value. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to 1. 
  p(A | B) = p(B | A) p(A) / p(B). Let the applicant explain the terms “likelihood”, “prior”, and e”evidence”. Bayes’ theorem allows us to incorporate prior knowledge and sequentially update the belief. It is for example used in regression or classification. 
  Bayesian inference is a method of statistical inference, in which Bayes' theorem/rule is used to update the probability for a hypothesis as more evidence or information becomes available. (In loose words) Bayesian inference updates a belief, i.e., distributions over parameter values, given data and a generative model. 
  ELBO stands for evidence lower bound objective. One term is the expected log likelihood a.k.a. the 'accuracy' objective. The other term is the KL-divergence against the prior and acts as regularization. (For Bayesian linear regression the ELBO looks a lot like MSE with weight decay.) 
  Gaussian Processes are Bayesian linear regression in an infinite feature space, which is achieved by using a kernel function. 
  Maximization of the log marginal likelihood. 
  Normalizing flows are a specific type of neural networks, used to learn a density by the combination of a latent simple density and a diffeomorphic network (special layers that simplify the calculation of the determinant of the networks Jacobian). 
  This question is intentionally ambiguous. It depends on the definition of a parameter: No parameters is hard to argue for. The typical answer is “as many as data points”, then each data point is considered a parameter. However, the employed kernels have parameters, too. For example, exponential kernels can be considered to have infinitely many parameters. 
  An autoencoder is used to learn a representation. A variational autoencoder is a generative model used to learn/approximate a target data distribution. Often the representation learned from variational autoencoders is used as features (e.g., disentangling). However, the two models are fundamentally different as the autoencoder can not do distribution matching. 
  Dynamics models can be wrong, and/or difficult to obtain. Model-based reinforcement learning generally suffers more from the exploration-exploitation trade-off. 
  Figure out if the applicant is aware of the problems in the desired field of research. 
  Start with a population of candidates, i.e., sets of parameters. Evaluate all of them, which yields a fitness value for each of them. Then select (the best) sets of parameters and create a new population by crossover and/or mutation. 
  Aleatoric uncertainty is also known as statistical uncertainty, and is representative of unknowns that differ each time we run the same experiment. Epistemic uncertainty is also known as systematic uncertainty, and is due to things one could in principle know but does not in practice. This may be because a measurement is not accurate, because the model neglects certain effects, or because particular data have been deliberately hidden. Epistemic uncertainty is connected to the model and can be reduced by gathering more/better data. 
  (i) There is have labeled data (e.g. categories with values leading to a label), (ii) select a value for K (e.g. using cross-validation), (iii) compute the distances in category space (distance measure is sensitive to scaling of the units), (iv) sort according to the distance and considering the K nearest data points, (v) infer (e.g. select class by majority vote). 
    
6. Specific Platforms or Software A ROS topic uses the paradigm publish-subscribe to share asynchronously data between different ROS nodes. A service instead, implements a client-server synchronous communication, i.e., the client will wait until the server responds. An action is also a client-server communication, but it is asynchronous, i.e., the server will signal the outcome of the action to the client when the action has been completed (or if a failure occurs). 
  It divides matrices, i.e., left or right multiplies one matrix by the inverse of another. This is done by solving a system of equations. 
  You could double-check his/her claims from the CV by asking about some commands you know (e.g., mv, rsync, git fetch). 
  You could double-check his/her claims from the CV. GPUs are particularly useful when there is computation which can be parallelized. 
  The main benefit is their automatic differentiation, simplifying first order gradient-based optimization. 
  A[..., ::3, ...] 
  No, but it is more pythonic. 
  optimizer.zero_grad() 
    
7. Finishers Let the applicant talk about something that he/she enjoys to make him/her feel comfortable. 
  Let the applicant talk about something that he/she enjoys to make him/her feel comfortable. 
  Don’t tell Joao if it is not Portugal. 
  The applicant should have some questions prepared. Especially if this is his/her first interview! Note what the applicant is curious about. 
  Maybe he/she has good ideas to improve our process. If the applicant was surprised that there are so many “technical questions”, ask them back what they expected from this interview. 
Next InterviewerEmail Adress:

  

zum Seitenanfang