Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CTDResidualLearner Class Reference

#include <ctdlearner.h>

Inheritance diagram for CTDResidualLearner:

CTDGradientLearner CTDLearner CSemiMDPRewardListener CErrorSender CSemiMDPListener CParameterObject CParameters CAdvantageLearner List of all members.


Public Member Functions

  CTDResidualLearner (CRewardFunction *rewardFunction, CGradientQFunction *qfunction, CAgentController *agent, CResidualFunction *residual, CResidualGradientFunction *residualGradient, CAbstractBetaCalculator *betaCalc)
  ~CTDResidualLearner ()
void  newEpisode ()
  Resets the Etraces.

virtual void  addETraces (CStateCollection *oldState, CStateCollection *newState, CAction *action, double td)
CGradientQETraces getResidualETraces ()


Protected Member Functions

virtual void  learnStep (CStateCollection *oldState, CAction *action, double reward, CStateCollection *nextState)
  Updates the Q-Function and manages the Etraces.



Protected Attributes

CGradientQETraces residualGradientTraces
CGradientQETraces directGradientTraces
CGradientQETraces residualETraces
CAbstractBetaCalculator betaCalculator

Constructor & Destructor Documentation

CTDResidualLearner::CTDResidualLearner CRewardFunction rewardFunction,
CGradientQFunction qfunction,
CAgentController agent,
CResidualFunction residual,
CResidualGradientFunction residualGradient,
CAbstractBetaCalculator betaCalc
 
CTDResidualLearner::~CTDResidualLearner  ) 
 

Member Function Documentation

virtual void CTDResidualLearner::addETraces CStateCollection oldState,
CStateCollection newState,
CAction action,
double  td
[virtual]
 

Reimplemented in CAdvantageLearner.

CGradientQETraces* CTDResidualLearner::getResidualETraces  )  [inline]
 
virtual void CTDResidualLearner::learnStep CStateCollection oldState,
CAction action,
double  reward,
CStateCollection nextState
[protected, virtual]
 

Updates the Q-Function and manages the Etraces.

The learnStep Function updates the Q-Function according the step sample. The function is called by the nextStep event. First of all the last estimated action (a_{t+1}) is compared to the action doublely executed. If these two actions are not equal, the ETraces have to be reset, because the agent didn't follow the policy to learn, using the etraces of older states would falsify the Q-Values. If the 2 actions are equal the Etraces gets multiplied by lambda*gamma. After that, the Etrace of the current state-action pair is added to the ETraces object, then the next estimated action is calculated by the given policy. Now the temporal difference can be calculated by R(s_t, a_t, s_{t+1}) + gamma * Q(s_{t+1}, a_{t+1})- Q(s_t,a_t)) or R(s_t, a_t, s_{t+1}) + gamma^N * Q(s_{t+1}, a_{t+1})- Q(s_t,a_t)) for multi-step actions. Having the temporal difference all the states in the ETraces are updated by the updateQFunction method from the Q-Etraces object.

Reimplemented from CTDLearner.

void CTDResidualLearner::newEpisode  )  [virtual]
 

Resets the Etraces.

Reimplemented from CTDLearner.


Member Data Documentation

CAbstractBetaCalculator* CTDResidualLearner::betaCalculator [protected]
 
CGradientQETraces* CTDResidualLearner::directGradientTraces [protected]
 
CGradientQETraces* CTDResidualLearner::residualETraces [protected]
 
CGradientQETraces* CTDResidualLearner::residualGradientTraces [protected]
 

The documentation for this class was generated from the following file: