An Intuitive Introduction to Reinforcement Learning Algorithms

In this series of blog posts, I aim to explain reinforcement learning (RL) algorithms by visualizing their inner workings. We will take basic RL algorithms and go through their pseudo-code, line by line, to show how code translates to learning.

What this is not.

There is a lot of material on reinforcement learning out there. Here I outline the many things that this introduction aims not to be, mainly because they are already explained elsewhere in great detail. This introduction is not...

  • ... a general introduction to the field of RL. That is, I will not ask why RL could be useful or how RL is defined. It is assumed that the reades has a basic interest or familiarity with the subject. I think that Sutton & Barto's Intro to RLI will follow the basic structure of this book and use the same notation. book provides a great introduction to the field.
  • ... a mathematical analysis of RL algorithms. That is, we will not discuss convergence properties or similar things. I found Shimkin's lecture notes a useful reference for an introductory treatment of convergence in RL.
  • ... a documented code repo with ready-to-use implementations.
  • ... a tutorial on how to implement state-of-the-art algorithms in highly complex domains.

What is it then?

The goal, instead, is to learn the very basics of reinforcement learning. There are a thousand ways of explaining algorithms to someone unfamiliar with the field. Some prefer to see some code first, others prefer to directly see the underlying maths, yet again others benefit from reading an extensive verbal descriptiion first.

Here, I hope to build intuition by explaining the pseudo code of an algorithm using interactive visualizations of what is being learned. To this aim, we need a simple environment where 1) it is possible to have an overview of everything that happens during a single step, and 2) the learning process is fast enough so that the learning behavior can be understood by going through a few steps of iterations of an algorithm.

Unfortunately neither of these conditions is met in exciting RL domains such as ATARI games, Poker, or Go. We therefore have to be content with a very simple and very boring toy environment: the "gridworld"The gridworld environment is explained in detail in this blog post.. Gridworlds are also used in many other introductory materials on RL such as, for example, Sutton & Barto's Intro to RL book or Andreij Karpathy's excellent reinforce.js library.


There will be posts about the following approaches to reinforcement learning:

  • dynamic programming (to do)
  • Monte Carlo approaches (to do)
  • temporal difference (TD) learning (to do)
  • TD learning using function approximation (to do)
  • and maybe more...

All posts will be linked from this page.

© 2020