CVFunctionLearner Class Reference
TD
Learner for Value Function learning.
More...
#include <cvfunctionlearner.h>
Inheritance diagram for CVFunctionLearner:
List of all
members.
|
Public Member Functions
|
|
|
CVFunctionLearner
(CRewardFunction
*rewardFunction, CAbstractVFunction
*vFunction,
CAbstractVETraces
*eTraces)
|
| |
Creates a V-Function Learner which uses the
given etraces for the V-Function.
|
|
|
CVFunctionLearner
(CRewardFunction
*rewardFunction, CAbstractVFunction
*vFunction)
|
| |
Creates a V-Function Learner which uses the
standard etraces for the V-Function.
|
|
virtual
|
~CVFunctionLearner
()
|
|
virtual double
|
getTemporalDifference
(CStateCollection *oldState,
CAction *action, double
reward, CStateCollection
*nextState)
|
| |
Calculates the temporal difference.
|
|
virtual void
|
updateVFunction
(CStateCollection *oldState,
CStateCollection *newState,
int duration, double td)
|
| |
Updates the V-Function, calls the update
V-Function method of the etrace object.
|
|
virtual void
|
nextStep (CStateCollection *oldState,
CAction *action, double
reward, CStateCollection
*nextState)
|
| |
Calls updateVFunction with the calculated
temporal difference.
|
|
virtual void
|
intermediateStep
(CStateCollection *oldState,
CAction *action, double
reward, CStateCollection
*nextState)
|
| |
Updates the V-Function for a intermediate
step (only for hierarchic MDP's).
|
|
virtual void
|
newEpisode ()
|
| |
Resets the etraces.
|
|
CAbstractVFunction
*
|
getVFunction ()
|
| |
Returns the used V-Function.
|
|
double
|
getLearningRate
()
|
|
void
|
setLearningRate (double
learningRate)
|
|
CAbstractVETraces
*
|
getVETraces ()
|
| |
Returns the used ETraces for the VFunction.
|
Protected Member Functions
|
|
virtual void
|
addETraces (CStateCollection *oldState,
CStateCollection *newState,
int duration)
|
| |
adds the current state to the etrace object.
|
Protected Attributes
|
|
CAbstractVFunction
*
|
vFunction |
| |
learned VFunction
|
|
CAbstractVETraces
*
|
eTraces |
| |
Etraces of the Value Function.
|
|
bool
|
bExternETraces |
| |
are extern Etraces used?
|
Detailed Description
TD Learner for Value Function learning.
The Value function is learned by a normal TD-Update similar to
the TD-Learner for Q-Learning. The temporal difference is
calculated each step with the formular td = r_t + gamma *
V(s_{t+1}) - V(s_t). The class CVFunctionLearner uses an CVEtraces
object to boost learning. The etraces update the V-Function each
step with the temporal difference value, which gets multiplied by
the learning rate (Parameter: "VLearningRate") before updating.
Each step, the etraces are multiplied by the usual attentuation
factor (lambda * gamma) and the current step is added to the
etraces. When a new episode is started the etraces gets
reseted.
Value Function learner are well used in combination with a
Dynamic Model for the policy (see CVMStochasticPolicy). When
you use a dynamic model for your policy you learning performance
will be considerably better than with Q-Learning.
CVFunctionLearner has following Parameters:
- inherits all Parameters from the V-Function
- inherits all Parameters from the ETraces
- "VLearningRate", 0.2 : learning rate of the algorithm
- "DiscountFactor", 0.95 : discount factor of the learning
problem
Constructor & Destructor Documentation
| |
Creates a V-Function Learner which uses the given etraces for
the V-Function.
|
| |
Creates a V-Function Learner which uses the standard etraces for
the V-Function.
|
| virtual
CVFunctionLearner::~CVFunctionLearner
|
( |
|
) |
[virtual] |
|
Member Function Documentation
| double
CVFunctionLearner::getLearningRate
|
( |
|
) |
|
|
| |
Calculates the temporal difference.
The temporal difference for the given step is td = r_t + gamma *
V(s_{t+1}) - V(s_t) respectively td = r_t + gamma^N * V(s_{t+1}) -
V(s_t) for multistep actions.
Reimplemented in CVFunctionGradientLearner.
|
| |
Returns the used ETraces for the
VFunction.
|
| |
Returns the used V-Function.
|
| |
Updates the V-Function for a intermediate step (only for
hierarchic MDP's).
Since the intermediate steps aren't doublely member of the
hierarchic episode they need special treatment for etraces. The
state of the intermediate step is added to the ETraces object as
usual, but the attenutuation of all other etraces is canceled and
the V-Function isn’t updated with the whole ETraces object,
only the current V-Value of the intermediate state is updated. This
is done because the intermediate step isn't directly reachable for
the past states and update all intermediate steps via etraces would
falsify the V-Values since the same step gets updates several
times.
Reimplemented from CSemiMDPRewardListener.
|
| virtual void
CVFunctionLearner::newEpisode
|
( |
|
) |
[virtual] |
|
| void
CVFunctionLearner::setLearningRate
|
( |
double |
learningRate |
) |
|
|
| |
Updates the V-Function, calls the update V-Function method of
the etrace object.
First the etraces gets multiplied by the attentuation factor
(lambda * gamma)^duration, then the etrace of the current step gets
added, and than the V-Function is updated by the update function of
the etrace object. The update factor is td * learningrate.
Reimplemented in CVFunctionResidualLearner.
|
Member Data Documentation
| |
Etraces of the Value Function.
|
The documentation for this class was generated from the following
file:
|