Notation and Symbols used for Robot Learning
- Vectors will always be written in bold font and lower case, i.e., .
- With a vector, we will always denote a column vector, i.e., . A row vector is denoted as .
- Matrices will always be written in bold font and upper case, i.e., .
- Gradients are always defined as row fectors, i.e., .
- The gradient of a vector valued function is a matrix defined as
- The expectation of a function with respect to a distribution will be written as
- ... joint positions, ... joint velocities, ... joint accelerations
- ... motor command, controls
- ... 1. torques (a motor command), 2. trajectory or 3. temporal scaling parameter for movement primitives ()
- ... action (often and can be replaced)
- ... state of the agent (used in most RL literature)
- ... 1. state of the system (used in control literature, often and can be replaced), 2. task space coordinates (for example end-effector coordinates) 3. input sample for supervised learning methods
- ... 1. state of a dynamical movement primitive, 2. output sample for supervised learning methods
- ... 1. ... forward kinematics, 2. (or similar notation for state and control) ... forward dynamics
- ... Jacobian (of the forward kinematics)
- ... 1. parameter vector, 2. (occasionally) joint angles
- ... feature vector of a single sample
- ... feature matrix containing the feature vectors of all samples (each row is a transposed feature vector)
- ... regularization constant = precision of the prior over the parameters
- ... measurement noise
- ... matrix of all input vectors (in each row a sample)
- ... matrix of all output vectors (in each row a sample)
Optimal Decision Making
- ... determinstic policy
- ... stochastic policy
- ... state visit distribution of policy
- ... initial state distribution
- ... reward function
- ... expected long term reward of policy
- ... value function of policy $\pi$
- ... state-action value function of policy $\pi$
- ... optimal value function
- ... optimal state-action value function
- ... lower level policy for controlling the robot (stochastic)
- ... lower level policy for controlling the robot (deterministic)
- ... parameter vector of the lower level policy
- ... upper level policy (for choosing the parameters of the lower level policy)
- ... parameter vector of the upper level policy
- or ... expected return function that depends on the parameters of lower level policy (left) or upper level policy (right)
- or ... gradient with respect to the parameters of lower-level policy parameters (left) or upper-level policy parameters
- ... return for the ith executed episode.
- ... reward to come for time step t in the ith executed episode.
- ... Fisher information matrix (FIM)