Download the Reinforcement Learning (RL) Toolbox8 and the example files9. See ToolboxTutorial.pdf for a RL toolbox tutorial. A similar example can be found in the folder cliffworld to help you getting started.
Consider the gridworld shown in Figure 2. Implement this environment with the RL Toolbox as an undiscounted (
), episodic task with a start state at
and a goal state at
The actions move the agent up, down, left and right, unless he bumps into a wall, in which case the position is not changed. The reward is
on all normal transitions,
for bumping into a wall, and 0
at the bonus state marked with
Use Q-Learning and SARSA without eligibility traces to learn policies for this task. Use
-greedy action selection with a constant
. Measure and plot the online performance of both learning algorithms (i.e. average reward per episode), and also sketch the policies that the algorithms find. Explain any differences in the performance of the algorithms. Are the learned policies optimal? Try this exercise again with
being gradually reduced after every episode and explain what you find. Submit your code and the gridworld configuration file.