CAdvantageUpdating Class Reference
#include
<cadvantagelearning.h>
Inheritance diagram for CAdvantageUpdating:
List of all
members.
|
Public Member Functions
|
|
|
CAdvantageUpdating
(CRewardFunction
*rewardFunction, CAbstractQFunction
*qfunction,
CAbstractVFunction
*vFunction, double
dt)
|
|
virtual
|
~CAdvantageUpdating
()
|
Protected Member Functions
|
|
virtual double
|
getTemporalDifference
(CStateCollection *oldState,
CAction *action, double
reward, CStateCollection
*nextState)
|
| |
calculates the temporal difference
|
|
virtual void
|
addETraces
(CStateCollection *oldState,
CStateCollection *newState,
CAction
*action)
|
| |
adds the current state to the etraces
|
|
virtual void
|
learnStep (CStateCollection *oldState,
CAction *action, double
reward, CStateCollection
*nextState)
|
| |
Updates the Q-Function and manages the
Etraces.
|
Protected Attributes
|
|
CAbstractVFunction
*
|
vFunction |
|
CAbstractVETraces
*
|
vETraces |
Constructor & Destructor Documentation
| virtual
CAdvantageUpdating::~CAdvantageUpdating
|
( |
|
) |
[virtual] |
|
Member Function Documentation
| |
adds the current state to the etraces
Reimplemented from CTDLearner.
|
| |
calculates the temporal difference
Reimplemented from CTDLearner.
|
| |
Updates the Q-Function and manages the Etraces.
The learnStep Function updates the Q-Function according the step
sample. The function is called by the nextStep event. First of all
the last estimated action (a_{t+1}) is compared to the action
doublely executed. If these two actions are not equal, the ETraces
have to be reset, because the agent didn't follow the policy to
learn, using the etraces of older states would falsify the
Q-Values. If the 2 actions are equal the Etraces gets multiplied by
lambda*gamma. After that, the Etrace of the current state-action
pair is added to the ETraces object, then the next estimated action
is calculated by the given policy. Now the temporal difference can
be calculated by R(s_t, a_t, s_{t+1}) + gamma * Q(s_{t+1},
a_{t+1})- Q(s_t,a_t)) or R(s_t, a_t, s_{t+1}) + gamma^N *
Q(s_{t+1}, a_{t+1})- Q(s_t,a_t)) for multi-step actions. Having the
temporal difference all the states in the ETraces are updated by
the updateQFunction method from the Q-Etraces object.
Reimplemented from CTDLearner.
|
Member Data Documentation
The documentation for this class was generated from the following
file:
|