I have a couple of standard systems which I use for evaluating reinforcement learning algorithms. These systems are fairly straightforward to understand. I intend to make these Benchmarks public and therefore use the RLBench framework suggested by Drew Bagnell and John Langford in order to allow people to access these benchmarks. Currently, I have the following benchmarks online.
DLQR: Discrete Time-Linear Quadratic Regulation Problems are among the best understood optimal control problems. We can evaluate these in various ways and even find the analytical solutions for the policy graditients, state distributions, etc. There is tech report available [pdf].
CLQR: Using Continous Time-Linear Quadratic Regulation Problems as a benchmark seems non-intuitive on the first inspection. However, in this case, the RL program just yields gains and the time for which these gains are being used.
Cartpole: Physically realistic
Robot Arm Movements: Physically realistic
There benchmarks can be downloaded here. Note