|
Returns the distribution of the actions that is sampled by an
stochastic policy.
The function gets as input the current state, all available
actions, and the Q-Values (actually it can be any kind of value,
rating an action) of the actions as a double array. Usually only
this Q-Values are used for the distribution (the state is only used
for special exploration policies). The function has to overwrite
the Q-Values in double array with the distribution values.
Implements CActionDistribution.
|