Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CSoftMaxDistribution Class Reference

Soft Max Distribution for Stochastic Policies. More...

#include <cpolicies.h>

Inheritance diagram for CSoftMaxDistribution:

CActionDistribution CParameterObject CParameters List of all members.


Public Member Functions

  CSoftMaxDistribution (double beta)
virtual void  getDistribution (CStateCollection *state, CActionSet *availableActions, double *values)
  Returns the distribution of the actions that is sampled by an stochastic policy.

virtual bool  isDifferentiable ()
virtual void  getGradientFactors (CStateCollection *state, CAction *usedAction, CActionSet *actions, double *actionFactors, ColumnVector *gradientFactors)
  Calculates the derivation of the probability of choosing the specified action.


Detailed Description

Soft Max Distribution for Stochastic Policies.

This class implements the well known softmax distribution (sometimes calles Gibs distribution). The Softmax Distribution is differentiable and therefore can be used for policy gradient algorithms. The Distribution depends on the parameter "SoftMaxBeta" which specifies you the "greediness" of your distribution.

The class CSoftMaxDistribution has the following Parameters:

  • "SoftMaxBeta" : Greediness of the distribution

Constructor & Destructor Documentation

CSoftMaxDistribution::CSoftMaxDistribution double  beta  ) 
 

Member Function Documentation

virtual void CSoftMaxDistribution::getDistribution CStateCollection state,
CActionSet availableActions,
double *  values
[virtual]
 

Returns the distribution of the actions that is sampled by an stochastic policy.

The function gets as input the current state, all available actions, and the Q-Values (actually it can be any kind of value, rating an action) of the actions as a double array. Usually only this Q-Values are used for the distribution (the state is only used for special exploration policies). The function has to overwrite the Q-Values in double array with the distribution values.

Implements CActionDistribution.

virtual void CSoftMaxDistribution::getGradientFactors CStateCollection state,
CAction usedAction,
CActionSet actions,
double *  actionFactors,
ColumnVector *  gradientFactors
[virtual]
 

Calculates the derivation of the probability of choosing the specified action.

The function calculates the gradient dP(usedaction|actionFactors)/ (d_actionfactors). The actionfactors are again some kind of rating for the actions. The result has to be written in the output vector gradientfactors. This vector has always the same size as the actionfactors array (so the number of actions). Only the SoftMax Distribution supports calculating this gradient.

Reimplemented from CActionDistribution.

virtual bool CSoftMaxDistribution::isDifferentiable  )  [inline, virtual]
 

Reimplemented from CActionDistribution.


The documentation for this class was generated from the following file: