next up previous
Next: Own ideas for learning Up: MLB_Exercises_2008 Previous: On- and off-policy learning

Function approximation [3* P]

Implement the Mountain Car example from the Sutton and Barto book (Example 8.2) 10. A similar task (swing up a pendulum with V-Function Learning using RBFs) can be found in the folder mountaincar to help you getting started. The mountain car model is already implemented (see cmountaincarmodel.cpp). This exmaple is more advanced than the previous one and it might be necesarry for you to get more familiar with the RL toolbox (see Manual.pdf). Learn to reach the goal on top of the hill with the SARSA($ \lambda$ ) algorithm and linear function approximation. Use the following learning parameters: $ \lambda=0.9, \epsilon=0, \alpha=0.1$ . Initialize the action values to zero (optimistic initialization) to ensure exploration. Measure the steps needed to reach the goal to evaluate the success of your learning algorithm.

a)
Use 5 grid-tilings of size $ 9 \times 9$ to discretize the state space. Show in a plot how the number of steps needed to reach the goal evolves during learning.
b)
Use RBF function approximation with 30 evenly spaced RBF centers in each dimension (i.e. 900 total centers). Set the widths in every dimension such that one RBF roughly spans 1-2 tiles.
c)
Submit the code of your model and the learning algorithms.


next up previous
Next: Own ideas for learning Up: MLB_Exercises_2008 Previous: On- and off-policy learning
Haeusler Stefan 2009-01-19