JANUARY 5, 2020 · JAN MALTE LICHTENBERG

Q-learning

Rules

The agent (red sphere) can navigate the gridworld environment using one of four actions at each time step. The four actions correspond to the cardinal directions "north", "east", "south", and "west". The agent receives a reward, which depends on the type of cell that is entered.



Rules

The agent (red sphere) can navigate the gridworld environment using one of four actions at each time step. The four actions correspond to the cardinal directions "north", "east", "south", and "west". The agent receives a reward, Reward received when a cell is entered.

which depends on the type of cell that is entered.

© 2021