CTDResidualLearner Class Reference
#include
<ctdlearner.h>
Inheritance diagram for CTDResidualLearner:
List of all
members.
|
Public Member Functions
|
|
|
CTDResidualLearner
(CRewardFunction
*rewardFunction, CGradientQFunction
*qfunction,
CAgentController *agent,
CResidualFunction
*residual,
CResidualGradientFunction
*residualGradient,
CAbstractBetaCalculator
*betaCalc)
|
|
|
~CTDResidualLearner
()
|
|
void
|
newEpisode ()
|
| |
Resets the Etraces.
|
|
virtual void
|
addETraces
(CStateCollection *oldState,
CStateCollection *newState,
CAction *action, double
td)
|
|
CGradientQETraces
*
|
getResidualETraces
()
|
Protected Member Functions
|
|
virtual void
|
learnStep (CStateCollection *oldState,
CAction *action, double
reward, CStateCollection
*nextState)
|
| |
Updates the Q-Function and manages the
Etraces.
|
Protected Attributes
|
|
CGradientQETraces
*
|
residualGradientTraces |
|
CGradientQETraces
*
|
directGradientTraces |
|
CGradientQETraces
*
|
residualETraces |
|
CAbstractBetaCalculator
*
|
betaCalculator |
Constructor & Destructor Documentation
|
CTDResidualLearner::~CTDResidualLearner
|
( |
|
) |
|
|
Member Function Documentation
| |
Updates the Q-Function and manages the Etraces.
The learnStep Function updates the Q-Function according the step
sample. The function is called by the nextStep event. First of all
the last estimated action (a_{t+1}) is compared to the action
doublely executed. If these two actions are not equal, the ETraces
have to be reset, because the agent didn't follow the policy to
learn, using the etraces of older states would falsify the
Q-Values. If the 2 actions are equal the Etraces gets multiplied by
lambda*gamma. After that, the Etrace of the current state-action
pair is added to the ETraces object, then the next estimated action
is calculated by the given policy. Now the temporal difference can
be calculated by R(s_t, a_t, s_{t+1}) + gamma * Q(s_{t+1},
a_{t+1})- Q(s_t,a_t)) or R(s_t, a_t, s_{t+1}) + gamma^N *
Q(s_{t+1}, a_{t+1})- Q(s_t,a_t)) for multi-step actions. Having the
temporal difference all the states in the ETraces are updated by
the updateQFunction method from the Q-Etraces object.
Reimplemented from CTDLearner.
|
| void
CTDResidualLearner::newEpisode
|
( |
|
) |
[virtual] |
|
| |
Resets the Etraces.
Reimplemented from CTDLearner.
|
Member Data Documentation
The documentation for this class was generated from the following
file:
|