CDynamicProgramming Class Reference
Collection of static
functions for dynamic Programming. More...
#include <cdynamicprogramming.h>
List of all
members.
|
Static Public Member Functions
|
|
static double
|
getActionValue
(CAbstractFeatureStochasticModel
*model, CFeatureRewardFunction
*rewardFunc, CAbstractVFunction
*vFunction, CState
*discState, CAction
*action, double gamma)
|
| |
Calculates the Action Value of the given
state action pair.
|
|
static double
|
getBellmanValue
(CAbstractFeatureStochasticModel
*model, CFeatureRewardFunction
*rewardFunc, CAbstractVFunction
*vFunction, CState
*discState, double gamma)
|
| |
Calculates the BellmanValue, which is the
best value achievable in the current State, given a Value Function
and a Reward Function.
|
|
static double
|
getBellmanError
(CAbstractFeatureStochasticModel
*model, CFeatureRewardFunction
*rewardFunc, CAbstractVFunction
*vFunction, CState
*discState, double gamma)
|
| |
Calculates the Bellman Error of the Value
Function in the given state.
|
Detailed Description
Collection of static functions for dynamic Programming.
Provides Functions for Calculating the Action Value, the Bellman
Value and the Bellman Error given a theoretical model, a V-Function
and a reward function for a given state.
Member Function Documentation
| |
Calculates the Action Value of the given state action pair.
The action value of a action a in state s is defined through
Q(s,a)=sum_{s'}P(s'|s,a)*(R(s,a,s')+gamma* V(s')). The
Propabilities P(s'|s,a) come from the forward transitions of the
model for the given state-action pair. The forward Transitiions are
iterated, the expected total discount reward is calculated
P(s'|s,a)*(R(s,a,s')+gamma* V_(s')) and the expectation of this
value, which is the action value, is calculated. For the semi MDP
case the formulae is a bit more complex,
Q(s,a)=sum_{s',N}P(s',N|s,a)*(R(s,a,s')+gamma^N* V(s')). This
formulae is used if the specified action is a multistep action, the
Transition objects are then CSemiMDPTransition objects,
which also stores the probapilities of the durations. R(s, a, s')
comes obviously from the reward Function, which has to be a feature
Reward Function, because the Reward for the Feature(DÃscrete
State)-Transitions are needed. The given state has to be a discrete
state, and the action has to be member of the model.
- See also:
- CTransition
CSemiMDPTransition
|
| |
Calculates the Bellman Error of the Value Function in the given
state.
The Bellman error is just the Bellman Value minus the Value of
the V-Function for the current state.
|
| |
Calculates the BellmanValue, which is the best value achievable
in the current State, given a Value Function and a Reward
Function.
Since the BellmanValue is the best Value achievable, its the
best action Value. So the function calculates V^*(s)=max_a Q(s,a),
the action Values come from the function getActionValue. The given
state has to be a discrete state
|
The documentation for this class was generated from the following
file:
|