8. The function takes the model structure , the policy parameters (i.e. the linear parameters in our case) and the variance of the stochastic policy as arguments. The function simulates the swimmer using as parameters of the policy for time steps ( resulting in a simulation time of ) and returns the summed reward (perf) for this episode ( ). In addition it returns the used noise vector for each time step (so is a matrix) and the features for each timestep ( matrix). The single rewards for each time step can also be obtained (rewards). The E.J function has additional output values which return the visited trajectory, the performed torques and the state variables of the dmp ( and ), see evaluate.m for further details.
To visualize a policy use
.
Finally some general remarks: the policy
has
parameters, where
denotes the number of links of the swimmer. The number of Gaussian kernel functions is given by the model and set to
. Make sure, that the parameters are within the interval
.