Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CAdvantageUpdating Class Reference

#include <cadvantagelearning.h>

Inheritance diagram for CAdvantageUpdating:

CTDLearner CSemiMDPRewardListener CErrorSender CSemiMDPListener CParameterObject CParameters List of all members.


Public Member Functions

  CAdvantageUpdating (CRewardFunction *rewardFunction, CAbstractQFunction *qfunction, CAbstractVFunction *vFunction, double dt)
virtual  ~CAdvantageUpdating ()


Protected Member Functions

virtual double  getTemporalDifference (CStateCollection *oldState, CAction *action, double reward, CStateCollection *nextState)
  calculates the temporal difference

virtual void  addETraces (CStateCollection *oldState, CStateCollection *newState, CAction *action)
  adds the current state to the etraces

virtual void  learnStep (CStateCollection *oldState, CAction *action, double reward, CStateCollection *nextState)
  Updates the Q-Function and manages the Etraces.



Protected Attributes

CAbstractVFunction vFunction
CAbstractVETraces vETraces

Constructor & Destructor Documentation

CAdvantageUpdating::CAdvantageUpdating CRewardFunction rewardFunction,
CAbstractQFunction qfunction,
CAbstractVFunction vFunction,
double  dt
 
virtual CAdvantageUpdating::~CAdvantageUpdating  )  [virtual]
 

Member Function Documentation

virtual void CAdvantageUpdating::addETraces CStateCollection oldState,
CStateCollection newState,
CAction action
[protected, virtual]
 

adds the current state to the etraces

Reimplemented from CTDLearner.

virtual double CAdvantageUpdating::getTemporalDifference CStateCollection oldState,
CAction action,
double  reward,
CStateCollection nextState
[protected, virtual]
 

calculates the temporal difference

Reimplemented from CTDLearner.

virtual void CAdvantageUpdating::learnStep CStateCollection oldState,
CAction action,
double  reward,
CStateCollection nextState
[protected, virtual]
 

Updates the Q-Function and manages the Etraces.

The learnStep Function updates the Q-Function according the step sample. The function is called by the nextStep event. First of all the last estimated action (a_{t+1}) is compared to the action doublely executed. If these two actions are not equal, the ETraces have to be reset, because the agent didn't follow the policy to learn, using the etraces of older states would falsify the Q-Values. If the 2 actions are equal the Etraces gets multiplied by lambda*gamma. After that, the Etrace of the current state-action pair is added to the ETraces object, then the next estimated action is calculated by the given policy. Now the temporal difference can be calculated by R(s_t, a_t, s_{t+1}) + gamma * Q(s_{t+1}, a_{t+1})- Q(s_t,a_t)) or R(s_t, a_t, s_{t+1}) + gamma^N * Q(s_{t+1}, a_{t+1})- Q(s_t,a_t)) for multi-step actions. Having the temporal difference all the states in the ETraces are updated by the updateQFunction method from the Q-Etraces object.

Reimplemented from CTDLearner.


Member Data Documentation

CAbstractVETraces* CAdvantageUpdating::vETraces [protected]
 
CAbstractVFunction* CAdvantageUpdating::vFunction [protected]
 

The documentation for this class was generated from the following file: