CSoftMaxDistribution Class Reference
Soft Max
Distribution for Stochastic Policies.
More...
#include <cpolicies.h>
Inheritance diagram for CSoftMaxDistribution:
List of all
members.
Detailed Description
Soft Max Distribution for Stochastic Policies.
This class implements the well known softmax distribution
(sometimes calles Gibs distribution). The Softmax Distribution is
differentiable and therefore can be used for policy gradient
algorithms. The Distribution depends on the parameter "SoftMaxBeta"
which specifies you the "greediness" of your distribution.
The class CSoftMaxDistribution has the following Parameters:
- "SoftMaxBeta" : Greediness of the distribution
Constructor & Destructor Documentation
|
CSoftMaxDistribution::CSoftMaxDistribution
|
( |
double |
beta |
) |
|
|
Member Function Documentation
| virtual void
CSoftMaxDistribution::getDistribution
|
( |
CStateCollection *
|
state,
|
|
|
CActionSet *
|
availableActions,
|
|
|
double * |
values |
|
) |
[virtual] |
|
| |
Returns the distribution of the actions that is sampled by an
stochastic policy.
The function gets as input the current state, all available
actions, and the Q-Values (actually it can be any kind of value,
rating an action) of the actions as a double array. Usually only
this Q-Values are used for the distribution (the state is only used
for special exploration policies). The function has to overwrite
the Q-Values in double array with the distribution values.
Implements CActionDistribution.
|
| virtual void
CSoftMaxDistribution::getGradientFactors
|
( |
CStateCollection *
|
state,
|
|
|
CAction *
|
usedAction,
|
|
|
CActionSet *
|
actions,
|
|
|
double * |
actionFactors,
|
|
|
ColumnVector * |
gradientFactors |
|
) |
[virtual] |
|
| |
Calculates the derivation of the probability of choosing the
specified action.
The function calculates the gradient
dP(usedaction|actionFactors)/ (d_actionfactors). The actionfactors
are again some kind of rating for the actions. The result has to be
written in the output vector gradientfactors. This vector has
always the same size as the actionfactors array (so the number of
actions). Only the SoftMax Distribution supports calculating this
gradient.
Reimplemented from CActionDistribution.
|
| virtual bool
CSoftMaxDistribution::isDifferentiable
|
( |
|
) |
[inline,
virtual] |
|
The documentation for this class was generated from the following
file:
|