Many reinforcement learning algorithms rely on value functions. Estimating value functions from observed agent interactions is therefore an important problem, which is dominated by temporal difference methods due to their sample-efficiency. We have created a survey of these methods and evaluated their performance in a comprehensive experimental study.
This website contains further material accompanying the article.
All experiments and methods have been implemented in Python. The source code is available at https://github.com/chrodan/tdlearn. The repository includes ready-to-run scripts for reproducing the experimental results and figures of the paper. The README.md file provides a list of contents and installation instructions.
The derivation can be found at Attach:nLinkPendulum.pdf.