Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CStochasticPolicy Class Reference

Class for modeling a stochastic policy. More...

#include <cpolicies.h>

Inheritance diagram for CStochasticPolicy:

CAgentStatisticController CAgentController CActionObject CParameterObject CParameters CQStochasticPolicy CContinuousTimeVMPolicy CQStochasticExplorationPolicy CVMStochasticPolicy List of all members.


Public Member Functions

  CStochasticPolicy (CActionSet *actions, CActionDistribution *distribution)
  Creates a stochastic policy which can choose from the actions in "actions".

  ~CStochasticPolicy ()
virtual void  getActionProbabilities (CStateCollection *state, CActionSet *availableActions, double *actionValues, CActionDataSet *actionDataSet=NULL)
  virtual function for retrieving the action propability distribution

virtual CAction getNextAction (CStateCollection *state, CActionDataSet *dataset, CActionStatistics *stat)
  Choses an action according the distribution from getActionPropability.

virtual void  getActionValues (CStateCollection *state, CActionSet *availableActions, double *actionValues, CActionDataSet *actionDataSet=NULL)=0
  Interface function for calculating the action ratings, has to be implemented by the subclasses.

virtual bool  isDifferentiable ()
virtual void  getActionProbabilityGradient (CStateCollection *state, CAction *action, CActionData *data, CFeatureList *gradientState)
virtual void  getActionProbabilityLnGradient (CStateCollection *state, CAction *action, CActionData *data, CFeatureList *gradientState)
virtual void  getActionGradient (CStateCollection *state, CAction *action, CActionData *data, CFeatureList *gradientState)
  Interface function for calculating the derivative of an action factor.



Protected Member Functions

virtual void  getActionStatistics (CStateCollection *, CAction *, CActionStatistics *)
  virtual function for gettin the action statistic for the chosen action



Protected Attributes

double *  actionValues
  array to store the current action propabilites

CActionDistribution distribution
ColumnVector *  gradientFactors
CFeatureList actionGradientFeatures
CActionSet availableActions

Detailed Description

Class for modeling a stochastic policy.

Many algorithm need more than just a specific action for a specific state, especially when the policy is a stochastic policy very often the distribution for choosing an action is needed. This is modeled by CStochasticPolicy. The Policy choses an action according to a given propability distribution, you can specify this distribution in the constructor with the CActionDistribution object. In the getNextAction Method an action is chosen according the distribution returned by getActionProbabilities. The getActionProbabilities method has to call the getDistribution method from the CActionDistribution object with the action rating as input. How this action rating is calculated has to be implemented by the subclasses, usually the values comes from a Q-Function (see CQStochasticPolicy). Some algorithms like the policy gradient algorithm need a differentiable action distribution. CStochasticPolicy also provides an interface for differentiate your distribution with respect to the policy weights (weights of the Q-Function).

The gradient calculation of the policy is already implemented. You have the possibility to calculate dP(action| state)/ dweights or the logarithmic gradient which is the same as dP(action| state)/ dweights * 1 / P(action | state). Calculating the gradient of the action ratings (e.g. dQ(a,s)/dw for QFunctions) has to be implemented in the function getActionGradient if the stochastic policy is supposed to be differentiable. Differentiable policies also have to overwrite the function isDifferentiable, which always returns false for the base class. Wether the policy is differentiable or not depends on the kind of action ratings and on the distribution. Both of them have to be differentiable. The class als provides the possibility to get a statistics object for the action which was chosed. This is done by the virtual function getActionStatistics, which is called by the getNextAction Function if an statistics object is requestet.


Constructor & Destructor Documentation

CStochasticPolicy::CStochasticPolicy CActionSet actions,
CActionDistribution distribution
 

Creates a stochastic policy which can choose from the actions in "actions".

CStochasticPolicy::~CStochasticPolicy  ) 
 

Member Function Documentation

virtual void CStochasticPolicy::getActionGradient CStateCollection state,
CAction action,
CActionData data,
CFeatureList gradientState
[virtual]
 

Interface function for calculating the derivative of an action factor.

The function has to calculate d_actionratings(action)/dw, which is for example dQ(s,a)/dw.

Reimplemented in CQStochasticPolicy, and CVMStochasticPolicy.

virtual void CStochasticPolicy::getActionProbabilities CStateCollection state,
CActionSet availableActions,
double *  actionValues,
CActionDataSet actionDataSet = NULL
[virtual]
 

virtual function for retrieving the action propability distribution

For each action in the availableActions action set, the function has to calculate the propability and write it in the double array actionValues. The function first calculates the action ratings with the function getNextAction and then calculates the action distribution with the action distribution object

virtual void CStochasticPolicy::getActionProbabilityGradient CStateCollection state,
CAction action,
CActionData data,
CFeatureList gradientState
[virtual]
 
virtual void CStochasticPolicy::getActionProbabilityLnGradient CStateCollection state,
CAction action,
CActionData data,
CFeatureList gradientState
[virtual]
 
virtual void CStochasticPolicy::getActionStatistics CStateCollection ,
CAction ,
CActionStatistics
[inline, protected, virtual]
 

virtual function for gettin the action statistic for the chosen action

The class als provides the possibility to get a statistics object for the action which was chosed. This is done by the virtual function getActionStatistics, which is called by the getNextAction Function if an statistics object is requestet.

Reimplemented in CQStochasticPolicy.

virtual void CStochasticPolicy::getActionValues CStateCollection state,
CActionSet availableActions,
double *  actionValues,
CActionDataSet actionDataSet = NULL
[pure virtual]
 

Interface function for calculating the action ratings, has to be implemented by the subclasses.

Implemented in CQStochasticExplorationPolicy, and CQStochasticPolicy.

virtual CAction* CStochasticPolicy::getNextAction CStateCollection state,
CActionDataSet dataset,
CActionStatistics stat
[virtual]
 

Choses an action according the distribution from getActionPropability.

First of all the available actions for the current state are calculated, and then the propabilities for this avialable actions. Then an action is chosen from the available actions set according the distribution.

Reimplemented from CAgentStatisticController.

virtual bool CStochasticPolicy::isDifferentiable  )  [inline, virtual]
 

Reimplemented in CQStochasticPolicy, and CVMStochasticPolicy.


Member Data Documentation

CFeatureList* CStochasticPolicy::actionGradientFeatures [protected]
 
double* CStochasticPolicy::actionValues [protected]
 

array to store the current action propabilites

CActionSet* CStochasticPolicy::availableActions [protected]
 
CActionDistribution* CStochasticPolicy::distribution [protected]
 
ColumnVector* CStochasticPolicy::gradientFactors [protected]
 

The documentation for this class was generated from the following file: