Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CAdaptiveParameterFromAverageRewardCalculator Class Reference

Adaptive Parameter Calculator which calculates the parameter's value from the current average reward. More...

#include <cagentlistener.h>

Inheritance diagram for CAdaptiveParameterFromAverageRewardCalculator:

CAdaptiveParameterBoundedValuesCalculator CSemiMDPRewardListener CAdaptiveParameterCalculator CSemiMDPListener CParameterObject CParameterObject CParameters CParameters List of all members.


Public Member Functions

  CAdaptiveParameterFromAverageRewardCalculator (CParameters *targetObject, string targetParameter, CRewardFunction *reward, int nStepsPerUpdate, int functionKind, double paramMin, double paramMax, double targetMin, double targetMax, double alpha)
  ~CAdaptiveParameterFromAverageRewardCalculator ()
virtual void  nextStep (CStateCollection *oldState, CAction *action, double reward, CStateCollection *newState)
  virtual function, to be implemented by subclass

virtual void  onParametersChanged ()
  Updates all data elements represents parameters.

virtual void  resetCalculator ()
  Reset the targetValue.



Protected Attributes

double  alpha
double  targetValue
int  nSteps
int  nStepsPerUpdate

Detailed Description

Adaptive Parameter Calculator which calculates the parameter's value from the current average reward.

The target value in this class is the current average reward. The target value gets resetted the minimim expected reward if a new learning trial has started. This adaptive parameter has to be added to the agent's listener list in order to calculate the average reward. The average reward is calculated dynamically with the formular averagereward_t+1 = averagereward_t * alpha + reward_t+1 * (1 - alpha). Alpha can be set with the parameter "APRewardUpdateRate" and defines the update rate of the average reward. Alpha should be choosen close to 0.99 to get good results. The average reward is not resetted when a new episode begins. For more details see the super class. Parameters of CAdaptiveParameterFromNStepsCalculator:


Constructor & Destructor Documentation

CAdaptiveParameterFromAverageRewardCalculator::CAdaptiveParameterFromAverageRewardCalculator CParameters targetObject,
string  targetParameter,
CRewardFunction reward,
int  nStepsPerUpdate,
int  functionKind,
double  paramMin,
double  paramMax,
double  targetMin,
double  targetMax,
double  alpha
 
CAdaptiveParameterFromAverageRewardCalculator::~CAdaptiveParameterFromAverageRewardCalculator  ) 
 

Member Function Documentation

virtual void CAdaptiveParameterFromAverageRewardCalculator::nextStep CStateCollection oldState,
CAction action,
double  reward,
CStateCollection newState
[virtual]
 

virtual function, to be implemented by subclass

Reimplemented from CSemiMDPRewardListener.

virtual void CAdaptiveParameterFromAverageRewardCalculator::onParametersChanged  )  [virtual]
 

Updates all data elements represents parameters.

Reimplemented from CAdaptiveParameterBoundedValuesCalculator.

virtual void CAdaptiveParameterFromAverageRewardCalculator::resetCalculator  )  [virtual]
 

Reset the targetValue.

This function is used for resetting for example the steps or number of episodes when learning is restarted. (used for parameter evaluation)

Implements CAdaptiveParameterCalculator.


Member Data Documentation

double CAdaptiveParameterFromAverageRewardCalculator::alpha [protected]
 
int CAdaptiveParameterFromAverageRewardCalculator::nSteps [protected]
 
int CAdaptiveParameterFromAverageRewardCalculator::nStepsPerUpdate [protected]
 
double CAdaptiveParameterFromAverageRewardCalculator::targetValue [protected]
 

The documentation for this class was generated from the following file: