Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CDynamicProgramming Class Reference

Collection of static functions for dynamic Programming. More...

#include <cdynamicprogramming.h>

List of all members.



Static Public Member Functions

static double  getActionValue (CAbstractFeatureStochasticModel *model, CFeatureRewardFunction *rewardFunc, CAbstractVFunction *vFunction, CState *discState, CAction *action, double gamma)
  Calculates the Action Value of the given state action pair.

static double  getBellmanValue (CAbstractFeatureStochasticModel *model, CFeatureRewardFunction *rewardFunc, CAbstractVFunction *vFunction, CState *discState, double gamma)
  Calculates the BellmanValue, which is the best value achievable in the current State, given a Value Function and a Reward Function.

static double  getBellmanError (CAbstractFeatureStochasticModel *model, CFeatureRewardFunction *rewardFunc, CAbstractVFunction *vFunction, CState *discState, double gamma)
  Calculates the Bellman Error of the Value Function in the given state.


Detailed Description

Collection of static functions for dynamic Programming.

Provides Functions for Calculating the Action Value, the Bellman Value and the Bellman Error given a theoretical model, a V-Function and a reward function for a given state.


Member Function Documentation

static double CDynamicProgramming::getActionValue CAbstractFeatureStochasticModel model,
CFeatureRewardFunction rewardFunc,
CAbstractVFunction vFunction,
CState discState,
CAction action,
double  gamma
[static]
 

Calculates the Action Value of the given state action pair.

The action value of a action a in state s is defined through Q(s,a)=sum_{s'}P(s'|s,a)*(R(s,a,s')+gamma* V(s')). The Propabilities P(s'|s,a) come from the forward transitions of the model for the given state-action pair. The forward Transitiions are iterated, the expected total discount reward is calculated P(s'|s,a)*(R(s,a,s')+gamma* V_(s')) and the expectation of this value, which is the action value, is calculated. For the semi MDP case the formulae is a bit more complex, Q(s,a)=sum_{s',N}P(s',N|s,a)*(R(s,a,s')+gamma^N* V(s')). This formulae is used if the specified action is a multistep action, the Transition objects are then CSemiMDPTransition objects, which also stores the probapilities of the durations. R(s, a, s') comes obviously from the reward Function, which has to be a feature Reward Function, because the Reward for the Feature(Díscrete State)-Transitions are needed. The given state has to be a discrete state, and the action has to be member of the model.

See also:
CTransition

CSemiMDPTransition

static double CDynamicProgramming::getBellmanError CAbstractFeatureStochasticModel model,
CFeatureRewardFunction rewardFunc,
CAbstractVFunction vFunction,
CState discState,
double  gamma
[static]
 

Calculates the Bellman Error of the Value Function in the given state.

The Bellman error is just the Bellman Value minus the Value of the V-Function for the current state.

static double CDynamicProgramming::getBellmanValue CAbstractFeatureStochasticModel model,
CFeatureRewardFunction rewardFunc,
CAbstractVFunction vFunction,
CState discState,
double  gamma
[static]
 

Calculates the BellmanValue, which is the best value achievable in the current State, given a Value Function and a Reward Function.

Since the BellmanValue is the best Value achievable, its the best action Value. So the function calculates V^*(s)=max_a Q(s,a), the action Values come from the function getActionValue. The given state has to be a discrete state


The documentation for this class was generated from the following file: