Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CQStochasticPolicy Class Reference

Stochastic Policy which computes its propabilities from the Q-Values of. More...

#include <cpolicies.h>

Inheritance diagram for CQStochasticPolicy:

CStochasticPolicy CAgentStatisticController CAgentController CActionObject CParameterObject CParameters CContinuousTimeVMPolicy CQStochasticExplorationPolicy CVMStochasticPolicy List of all members.


Public Member Functions

  CQStochasticPolicy (CActionSet *actions, CActionDistribution *distribution, CAbstractQFunction *qfunction)
  ~CQStochasticPolicy ()
virtual void  getActionValues (CStateCollection *state, CActionSet *availableActions, double *actionValues, CActionDataSet *actionDataSet=NULL)
  Interface function for calculating the action ratings, has to be implemented by the subclasses.

virtual void  getActionGradient (CStateCollection *state, CAction *action, CActionData *data, CFeatureList *gradientState)
  Interface function for calculating the derivative of an action factor.

virtual bool  isDifferentiable ()
virtual CAbstractQFunction getQFunction ()


Protected Member Functions

virtual void  getActionStatistics (CStateCollection *state, CAction *action, CActionStatistics *stat)
  returns the action statistics object from the q-function



Protected Attributes

CAbstractQFunction qfunction
  QFunction of the policy, needed for action decision.


Detailed Description

Stochastic Policy which computes its propabilities from the Q-Values of.

This stochastic policy calculates its action ratings according to the given Q-Function. The getActionValues function writes the Q-Values in the actionFactors array. The Q-Stochastic Policies also support gradient calculation. The policy is differentiable, if the distribution and the Q-Function are differentiable. The gradient d_actionratings(action) / dw calculated in the function getActionGradient is the same as dQ(s,a)/dw.


Constructor & Destructor Documentation

CQStochasticPolicy::CQStochasticPolicy CActionSet actions,
CActionDistribution distribution,
CAbstractQFunction qfunction
 
CQStochasticPolicy::~CQStochasticPolicy  ) 
 

Member Function Documentation

virtual void CQStochasticPolicy::getActionGradient CStateCollection state,
CAction action,
CActionData data,
CFeatureList gradientState
[virtual]
 

Interface function for calculating the derivative of an action factor.

The function has to calculate d_actionratings(action)/dw, which is for example dQ(s,a)/dw.

Reimplemented from CStochasticPolicy.

Reimplemented in CVMStochasticPolicy.

virtual void CQStochasticPolicy::getActionStatistics CStateCollection state,
CAction action,
CActionStatistics stat
[protected, virtual]
 

returns the action statistics object from the q-function

Reimplemented from CStochasticPolicy.

virtual void CQStochasticPolicy::getActionValues CStateCollection state,
CActionSet availableActions,
double *  actionValues,
CActionDataSet actionDataSet = NULL
[virtual]
 

Interface function for calculating the action ratings, has to be implemented by the subclasses.

Implements CStochasticPolicy.

Reimplemented in CQStochasticExplorationPolicy.

virtual CAbstractQFunction* CQStochasticPolicy::getQFunction  )  [inline, virtual]
 
virtual bool CQStochasticPolicy::isDifferentiable  )  [virtual]
 

Reimplemented from CStochasticPolicy.

Reimplemented in CVMStochasticPolicy.


Member Data Documentation

CAbstractQFunction* CQStochasticPolicy::qfunction [protected]
 

QFunction of the policy, needed for action decision.


The documentation for this class was generated from the following file: