next up previous
Next: Policy Gradient Methods Up: Policy Gradient Methods: Swimmer Previous: Policy Gradient Methods: Swimmer

Matlab package Description

In order to work with the swimmer model add the folder model to your matlab path. The model environment is stored in a structure, which is created with the command $ E = initE()$ . For implementing various policy gradient methods, the most important function is the function $ E.J$ to perform a single rollout

$\displaystyle [perf, \epsilon, \phi, rewards,...] = E.J(E, \theta, \sigma)$

8. The function takes the model structure $ E$ , the policy parameters $ \theta$ (i.e. the linear parameters $ \mathbf{b}$ in our case) and the variance of the stochastic policy $ \sigma$ as arguments. The function simulates the swimmer using $ \theta$ as parameters of the policy for $ 200$ time steps ( $ dt = 0.01s$ resulting in a simulation time of $ 2s$ ) and returns the summed reward (perf) for this episode ( $ \sum_t r_t$ ). In addition it returns the used noise vector $ \epsilon$ for each time step (so $ \epsilon$ is a $ 2 \times 200$ matrix) and the features $ \phi(x_t)$ for each timestep ( $ 6 \times 200$ matrix). The single rewards for each time step can also be obtained (rewards). The E.J function has additional output values which return the visited trajectory, the performed torques and the state variables of the dmp ($ y$ and $ \dot{y}$ ), see evaluate.m for further details.

To visualize a policy use $ plotPolicy(E, \theta)$ .
Finally some general remarks: the policy $ \theta$ has $ (E.d-1) \times E.NumRBFs$ parameters, where $ d=3$ denotes the number of links of the swimmer. The number of Gaussian kernel functions is given by the model and set to $ 6$ . Make sure, that the parameters are within the interval $ [+4, -4]$ .


next up previous
Next: Policy Gradient Methods Up: Policy Gradient Methods: Swimmer Previous: Policy Gradient Methods: Swimmer
Haeusler Stefan 2011-01-25