next up previous
Next: Reward Weighted Regression: Cannon Up: Policy Gradient Methods: Swimmer Previous: Matlab package Description

Policy Gradient Methods

At least for the finite difference method, you may normalize the gradient as a unit vector before the weight update, i.e.

$\displaystyle \delta \theta_{h+1} = \theta_{h} + \alpha \frac{\nabla_{\theta} J(\theta)}{\vert\nabla_{\theta} J(\theta)\vert}.$

This usually improves the learning speed. For a more exact description of the methods and the equations see the lecture slides.

All the policy gradient methods should be compared with respect to the learned speed. Therefore, create a performance curve (x-axis : number of episodes seen by the algorithm, y-axis: summed reward of current parameter value) for each algorithm. In order to get a reliable estimate, use the average over at least 10 trials for each curve.


next up previous
Next: Reward Weighted Regression: Cannon Up: Policy Gradient Methods: Swimmer Previous: Matlab package Description
Haeusler Stefan 2011-01-25