CActionDistribution Class Reference
Action Distribution
classes define the distributions of stochastic Policies.
More...
#include <cpolicies.h>
Inheritance diagram for CActionDistribution:
List of all
members.
Detailed Description
Action Distribution classes define the distributions of stochastic
Policies.
Action Distribution calculate the distribution for sampling an
action, which is done by the class CStochasticPolicy. The
distribution calculation usually depends on some kind of Q-Value of
the actions. This is done in the function getDistribution. The
function gets as input the current state, all available actions,
and the Q-Values (actually it can be any kind of value, rating an
action) of the actions as a double array. Usually only this
Q-Values are used for the distribution (the state is only used for
special exploration policies). The function has to overwrite the
Q-Values double array with the distribution values. Additionally
some algorithm needs a differntiable distribution. Therefore the
interface provides the function isDifferentiable (since not all
distributions are differentiable) and the function
getGradientFactors. The function calculates the gradient
dP(usedaction|actionFactors)/ (d_actionfactors). The actionfactors
are again some kind of rating for the actions. The result has to be
written in the output vector gradientfactors. This vector has
always the same size as the actionfactors array (so the number of
actions). Only the SoftMax Distribution supports calculating this
gradient.
Member Function Documentation
| virtual void
CActionDistribution::getDistribution
|
( |
CStateCollection *
|
state,
|
|
|
CActionSet *
|
availableActions,
|
|
|
double * |
actionFactors |
|
) |
[pure
virtual] |
|
| |
Returns the distribution of the actions that is sampled by an
stochastic policy.
The function gets as input the current state, all available
actions, and the Q-Values (actually it can be any kind of value,
rating an action) of the actions as a double array. Usually only
this Q-Values are used for the distribution (the state is only used
for special exploration policies). The function has to overwrite
the Q-Values in double array with the distribution values.
Implemented in CSoftMaxDistribution,
CAbsoluteSoftMaxDistribution,
CGreedyDistribution,
and CEpsilonGreedyDistribution.
|
| virtual void
CActionDistribution::getGradientFactors
|
( |
CStateCollection *
|
state,
|
|
|
CAction *
|
usedAction,
|
|
|
CActionSet *
|
actions,
|
|
|
double * |
actionFactors,
|
|
|
ColumnVector * |
gradientFactors |
|
) |
[virtual] |
|
| |
Calculates the derivation of the probability of choosing the
specified action.
The function calculates the gradient
dP(usedaction|actionFactors)/ (d_actionfactors). The actionfactors
are again some kind of rating for the actions. The result has to be
written in the output vector gradientfactors. This vector has
always the same size as the actionfactors array (so the number of
actions). Only the SoftMax Distribution supports calculating this
gradient.
Reimplemented in CSoftMaxDistribution.
|
| virtual bool
CActionDistribution::isDifferentiable
|
( |
|
) |
[inline,
virtual] |
|
The documentation for this class was generated from the following
file:
|