Policy Evaluation with Temporal Differences: A Survey and Comparison

Many reinforcement learning algorithms rely on value functions. Estimating value functions from observed agent interactions is therefore an important problem, which is dominated by temporal difference methods due to their sample-efficiency. We have created a survey of these methods and evaluated their performance in a comprehensive experimental study.

This website contains further material accompanying the article.

Source Code for Experiments

All experiments and methods have been implemented in Python. The source code is available at https://github.com/chrodan/tdlearn. The repository includes ready-to-run scripts for reproducing the experimental results and figures of the paper. The README.md file provides a list of contents and installation instructions.

Linearized System Dynamics of the n-link Pole Around the Balance Point

The derivation can be found at Attach:nLinkPendulum.pdf.