Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CActionDistribution Class Reference

Action Distribution classes define the distributions of stochastic Policies. More...

#include <cpolicies.h>

Inheritance diagram for CActionDistribution:

CParameterObject CParameters CAbsoluteSoftMaxDistribution CEpsilonGreedyDistribution CGreedyDistribution CSoftMaxDistribution List of all members.


Public Member Functions

virtual void  getDistribution (CStateCollection *state, CActionSet *availableActions, double *actionFactors)=0
  Returns the distribution of the actions that is sampled by an stochastic policy.

virtual bool  isDifferentiable ()
virtual void  getGradientFactors (CStateCollection *state, CAction *usedAction, CActionSet *actions, double *actionFactors, ColumnVector *gradientFactors)
  Calculates the derivation of the probability of choosing the specified action.


Detailed Description

Action Distribution classes define the distributions of stochastic Policies.

Action Distribution calculate the distribution for sampling an action, which is done by the class CStochasticPolicy. The distribution calculation usually depends on some kind of Q-Value of the actions. This is done in the function getDistribution. The function gets as input the current state, all available actions, and the Q-Values (actually it can be any kind of value, rating an action) of the actions as a double array. Usually only this Q-Values are used for the distribution (the state is only used for special exploration policies). The function has to overwrite the Q-Values double array with the distribution values. Additionally some algorithm needs a differntiable distribution. Therefore the interface provides the function isDifferentiable (since not all distributions are differentiable) and the function getGradientFactors. The function calculates the gradient dP(usedaction|actionFactors)/ (d_actionfactors). The actionfactors are again some kind of rating for the actions. The result has to be written in the output vector gradientfactors. This vector has always the same size as the actionfactors array (so the number of actions). Only the SoftMax Distribution supports calculating this gradient.


Member Function Documentation

virtual void CActionDistribution::getDistribution CStateCollection state,
CActionSet availableActions,
double *  actionFactors
[pure virtual]
 

Returns the distribution of the actions that is sampled by an stochastic policy.

The function gets as input the current state, all available actions, and the Q-Values (actually it can be any kind of value, rating an action) of the actions as a double array. Usually only this Q-Values are used for the distribution (the state is only used for special exploration policies). The function has to overwrite the Q-Values in double array with the distribution values.

Implemented in CSoftMaxDistribution, CAbsoluteSoftMaxDistribution, CGreedyDistribution, and CEpsilonGreedyDistribution.

virtual void CActionDistribution::getGradientFactors CStateCollection state,
CAction usedAction,
CActionSet actions,
double *  actionFactors,
ColumnVector *  gradientFactors
[virtual]
 

Calculates the derivation of the probability of choosing the specified action.

The function calculates the gradient dP(usedaction|actionFactors)/ (d_actionfactors). The actionfactors are again some kind of rating for the actions. The result has to be written in the output vector gradientfactors. This vector has always the same size as the actionfactors array (so the number of actions). Only the SoftMax Distribution supports calculating this gradient.

Reimplemented in CSoftMaxDistribution.

virtual bool CActionDistribution::isDifferentiable  )  [inline, virtual]
 

Reimplemented in CSoftMaxDistribution, and CAbsoluteSoftMaxDistribution.


The documentation for this class was generated from the following file: