CStochasticPolicy Class Reference
Class for
modeling a stochastic policy. More...
#include <cpolicies.h>
Inheritance diagram for CStochasticPolicy:
List of all
members.
|
Public Member Functions
|
|
|
CStochasticPolicy
(CActionSet
*actions,
CActionDistribution
*distribution)
|
| |
Creates a stochastic policy which can choose
from the actions in "actions".
|
|
|
~CStochasticPolicy
()
|
|
virtual void
|
getActionProbabilities
(CStateCollection *state,
CActionSet
*availableActions, double
*actionValues,
CActionDataSet
*actionDataSet=NULL)
|
| |
virtual function for retrieving the action
propability distribution
|
|
virtual CAction
*
|
getNextAction
(CStateCollection *state,
CActionDataSet
*dataset, CActionStatistics
*stat)
|
| |
Choses an action according the distribution
from getActionPropability.
|
|
virtual void
|
getActionValues
(CStateCollection *state,
CActionSet
*availableActions, double
*actionValues,
CActionDataSet
*actionDataSet=NULL)=0
|
| |
Interface function for calculating the
action ratings, has to be implemented by the subclasses.
|
|
virtual bool
|
isDifferentiable
()
|
|
virtual void
|
getActionProbabilityGradient
(CStateCollection *state,
CAction *action,
CActionData *data,
CFeatureList
*gradientState)
|
|
virtual void
|
getActionProbabilityLnGradient
(CStateCollection *state,
CAction *action,
CActionData *data,
CFeatureList
*gradientState)
|
|
virtual void
|
getActionGradient
(CStateCollection *state,
CAction *action,
CActionData *data,
CFeatureList
*gradientState)
|
| |
Interface function for calculating the
derivative of an action factor.
|
Protected Member Functions
|
|
virtual void
|
getActionStatistics
(CStateCollection *,
CAction *, CActionStatistics
*)
|
| |
virtual function for gettin the action
statistic for the chosen action
|
Protected Attributes
|
|
double *
|
actionValues |
| |
array to store the current action
propabilites
|
|
CActionDistribution
*
|
distribution |
|
ColumnVector *
|
gradientFactors |
|
CFeatureList
*
|
actionGradientFeatures |
|
CActionSet
*
|
availableActions |
Detailed Description
Class for modeling a stochastic policy.
Many algorithm need more than just a specific action for a
specific state, especially when the policy is a stochastic policy
very often the distribution for choosing an action is needed. This
is modeled by CStochasticPolicy. The Policy choses an action
according to a given propability distribution, you can specify this
distribution in the constructor with the CActionDistribution
object. In the getNextAction Method an action is chosen according
the distribution returned by getActionProbabilities. The
getActionProbabilities method has to call the getDistribution
method from the CActionDistribution object
with the action rating as input. How this action rating is
calculated has to be implemented by the subclasses, usually the
values comes from a Q-Function (see CQStochasticPolicy). Some
algorithms like the policy gradient algorithm need a differentiable
action distribution. CStochasticPolicy also provides an interface
for differentiate your distribution with respect to the policy
weights (weights of the Q-Function).
The gradient calculation of the policy is already implemented.
You have the possibility to calculate dP(action| state)/ dweights
or the logarithmic gradient which is the same as dP(action| state)/
dweights * 1 / P(action | state). Calculating the gradient of the
action ratings (e.g. dQ(a,s)/dw for QFunctions) has to be
implemented in the function getActionGradient if the stochastic
policy is supposed to be differentiable. Differentiable policies
also have to overwrite the function isDifferentiable, which always
returns false for the base class. Wether the policy is
differentiable or not depends on the kind of action ratings and on
the distribution. Both of them have to be differentiable. The class
als provides the possibility to get a statistics object for the
action which was chosed. This is done by the virtual function
getActionStatistics, which is called by the getNextAction Function
if an statistics object is requestet.
Constructor & Destructor Documentation
| |
Creates a stochastic policy which can choose from the actions in
"actions".
|
|
CStochasticPolicy::~CStochasticPolicy
|
( |
|
) |
|
|
Member Function Documentation
| |
Interface function for calculating the derivative of an action
factor.
The function has to calculate d_actionratings(action)/dw, which
is for example dQ(s,a)/dw.
Reimplemented in CQStochasticPolicy, and
CVMStochasticPolicy.
|
| |
virtual function for retrieving the action propability
distribution
For each action in the availableActions action set, the function
has to calculate the propability and write it in the double array
actionValues. The function first calculates the action ratings with
the function getNextAction and then calculates the action
distribution with the action distribution
object
|
| |
virtual function for gettin the action statistic for the chosen
action
The class als provides the possibility to get a statistics
object for the action which was chosed. This is done by the virtual
function getActionStatistics, which is called by the getNextAction
Function if an statistics object is requestet.
Reimplemented in CQStochasticPolicy.
|
| |
Choses an action according the distribution from
getActionPropability.
First of all the available actions for the current state are
calculated, and then the propabilities for this avialable actions.
Then an action is chosen from the available actions set according
the distribution.
Reimplemented from CAgentStatisticController.
|
| virtual bool
CStochasticPolicy::isDifferentiable
|
( |
|
) |
[inline,
virtual] |
|
Member Data Documentation
| |
array to store the current action
propabilites
|
The documentation for this class was generated from the following
file:
|