Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CVFunctionLearner Class Reference

TD Learner for Value Function learning. More...

#include <cvfunctionlearner.h>

Inheritance diagram for CVFunctionLearner:

CSemiMDPRewardListener CErrorSender CSemiMDPListener CParameterObject CParameters CVFunctionGradientLearner CVFunctionResidualLearner List of all members.


Public Member Functions

  CVFunctionLearner (CRewardFunction *rewardFunction, CAbstractVFunction *vFunction, CAbstractVETraces *eTraces)
  Creates a V-Function Learner which uses the given etraces for the V-Function.

  CVFunctionLearner (CRewardFunction *rewardFunction, CAbstractVFunction *vFunction)
  Creates a V-Function Learner which uses the standard etraces for the V-Function.

virtual  ~CVFunctionLearner ()
virtual double  getTemporalDifference (CStateCollection *oldState, CAction *action, double reward, CStateCollection *nextState)
  Calculates the temporal difference.

virtual void  updateVFunction (CStateCollection *oldState, CStateCollection *newState, int duration, double td)
  Updates the V-Function, calls the update V-Function method of the etrace object.

virtual void  nextStep (CStateCollection *oldState, CAction *action, double reward, CStateCollection *nextState)
  Calls updateVFunction with the calculated temporal difference.

virtual void  intermediateStep (CStateCollection *oldState, CAction *action, double reward, CStateCollection *nextState)
  Updates the V-Function for a intermediate step (only for hierarchic MDP's).

virtual void  newEpisode ()
  Resets the etraces.

CAbstractVFunction getVFunction ()
  Returns the used V-Function.

double  getLearningRate ()
void  setLearningRate (double learningRate)
CAbstractVETraces getVETraces ()
  Returns the used ETraces for the VFunction.



Protected Member Functions

virtual void  addETraces (CStateCollection *oldState, CStateCollection *newState, int duration)
  adds the current state to the etrace object.



Protected Attributes

CAbstractVFunction vFunction
  learned VFunction

CAbstractVETraces eTraces
  Etraces of the Value Function.

bool  bExternETraces
  are extern Etraces used?


Detailed Description

TD Learner for Value Function learning.

The Value function is learned by a normal TD-Update similar to the TD-Learner for Q-Learning. The temporal difference is calculated each step with the formular td = r_t + gamma * V(s_{t+1}) - V(s_t). The class CVFunctionLearner uses an CVEtraces object to boost learning. The etraces update the V-Function each step with the temporal difference value, which gets multiplied by the learning rate (Parameter: "VLearningRate") before updating. Each step, the etraces are multiplied by the usual attentuation factor (lambda * gamma) and the current step is added to the etraces. When a new episode is started the etraces gets reseted.

Value Function learner are well used in combination with a Dynamic Model for the policy (see CVMStochasticPolicy). When you use a dynamic model for your policy you learning performance will be considerably better than with Q-Learning.

CVFunctionLearner has following Parameters:

  • inherits all Parameters from the V-Function
  • inherits all Parameters from the ETraces
  • "VLearningRate", 0.2 : learning rate of the algorithm
  • "DiscountFactor", 0.95 : discount factor of the learning problem

Constructor & Destructor Documentation

CVFunctionLearner::CVFunctionLearner CRewardFunction rewardFunction,
CAbstractVFunction vFunction,
CAbstractVETraces eTraces
 

Creates a V-Function Learner which uses the given etraces for the V-Function.

CVFunctionLearner::CVFunctionLearner CRewardFunction rewardFunction,
CAbstractVFunction vFunction
 

Creates a V-Function Learner which uses the standard etraces for the V-Function.

virtual CVFunctionLearner::~CVFunctionLearner  )  [virtual]
 

Member Function Documentation

virtual void CVFunctionLearner::addETraces CStateCollection oldState,
CStateCollection newState,
int  duration
[protected, virtual]
 

adds the current state to the etrace object.

Reimplemented in CVFunctionGradientLearner.

double CVFunctionLearner::getLearningRate  ) 
 
virtual double CVFunctionLearner::getTemporalDifference CStateCollection oldState,
CAction action,
double  reward,
CStateCollection nextState
[virtual]
 

Calculates the temporal difference.

The temporal difference for the given step is td = r_t + gamma * V(s_{t+1}) - V(s_t) respectively td = r_t + gamma^N * V(s_{t+1}) - V(s_t) for multistep actions.

Reimplemented in CVFunctionGradientLearner.

CAbstractVETraces* CVFunctionLearner::getVETraces  ) 
 

Returns the used ETraces for the VFunction.

CAbstractVFunction* CVFunctionLearner::getVFunction  ) 
 

Returns the used V-Function.

virtual void CVFunctionLearner::intermediateStep CStateCollection oldState,
CAction action,
double  reward,
CStateCollection nextState
[virtual]
 

Updates the V-Function for a intermediate step (only for hierarchic MDP's).

Since the intermediate steps aren't doublely member of the hierarchic episode they need special treatment for etraces. The state of the intermediate step is added to the ETraces object as usual, but the attenutuation of all other etraces is canceled and the V-Function isn’t updated with the whole ETraces object, only the current V-Value of the intermediate state is updated. This is done because the intermediate step isn't directly reachable for the past states and update all intermediate steps via etraces would falsify the V-Values since the same step gets updates several times.

Reimplemented from CSemiMDPRewardListener.

virtual void CVFunctionLearner::newEpisode  )  [virtual]
 

Resets the etraces.

Reimplemented from CSemiMDPListener.

Reimplemented in CVFunctionResidualLearner.

virtual void CVFunctionLearner::nextStep CStateCollection oldState,
CAction action,
double  reward,
CStateCollection nextState
[virtual]
 

Calls updateVFunction with the calculated temporal difference.

Reimplemented from CSemiMDPRewardListener.

void CVFunctionLearner::setLearningRate double  learningRate  ) 
 
virtual void CVFunctionLearner::updateVFunction CStateCollection oldState,
CStateCollection newState,
int  duration,
double  td
[virtual]
 

Updates the V-Function, calls the update V-Function method of the etrace object.

First the etraces gets multiplied by the attentuation factor (lambda * gamma)^duration, then the etrace of the current step gets added, and than the V-Function is updated by the update function of the etrace object. The update factor is td * learningrate.

Reimplemented in CVFunctionResidualLearner.


Member Data Documentation

bool CVFunctionLearner::bExternETraces [protected]
 

are extern Etraces used?

CAbstractVETraces* CVFunctionLearner::eTraces [protected]
 

Etraces of the Value Function.

CAbstractVFunction* CVFunctionLearner::vFunction [protected]
 

learned VFunction


The documentation for this class was generated from the following file: