Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CActorFromActionValue Class Reference

Actor class which can only decide beween 2 different action, depending on the action value of the current state. More...

#include <cactorcritic.h>

Inheritance diagram for CActorFromActionValue:

CAgentController CActor CSemiMDPListener CActionObject CParameterObject CErrorListener CParameterObject CParameters CParameterObject CParameters CParameters List of all members.


Public Member Functions

  CActorFromActionValue (CAbstractVFunction *vFunction, CAction *action1, CAction *action2)
  ~CActorFromActionValue ()
virtual void  receiveError (double critic, CStateCollection *oldState, CAction *Action, CActionData *data=NULL)
  Adopt the action values according to the critic.

virtual CAction getNextAction (CStateCollection *state, CActionDataSet *data=NULL)
  Virtual function for returning the action for the specified state, must be implemented by all subclasses.

virtual void  newEpisode ()
  resets etraces object



Protected Attributes

CAbstractVFunction vFunction
CAbstractVETraces eTraces

Detailed Description

Actor class which can only decide beween 2 different action, depending on the action value of the current state.

This is the implementation of the simple Actor-Critic Algorithm used by Barto, Sutton, and Anderson in their cart pole example. The actor can only decide between 2 actions. Which action is taken depends on the action value of the current state. If this value is negative, the first action is more likely to be choosen and vice versa. The probabilty of choosing the first action is caculated the following way : 1.0 / (1.0 + exp(actionvalue(s))). The action weight value is represented by an V-Function, for updating the V-Function an etrace object is used. The current state is added to the etrace with a positive factor if the second action was choosed, otherwise with a negative factor. When a new episode begins, the etraces are resetted. This kind of algorithm usually need a very high learning rate, for this class 1000.0 is the standard value for the "ActorLearningRate" Parameter.

This class directly implements the CAgentController interface, so it can be used as controller.


Constructor & Destructor Documentation

CActorFromActionValue::CActorFromActionValue CAbstractVFunction vFunction,
CAction action1,
CAction action2
 
CActorFromActionValue::~CActorFromActionValue  ) 
 

Member Function Documentation

virtual CAction* CActorFromActionValue::getNextAction CStateCollection state,
CActionDataSet data = NULL
[virtual]
 

Virtual function for returning the action for the specified state, must be implemented by all subclasses.

Implements CAgentController.

virtual void CActorFromActionValue::newEpisode  )  [virtual]
 

resets etraces object

Reimplemented from CSemiMDPListener.

virtual void CActorFromActionValue::receiveError double  critic,
CStateCollection oldState,
CAction Action,
CActionData data = NULL
[virtual]
 

Adopt the action values according to the critic.

Implements CActor.


Member Data Documentation

CAbstractVETraces* CActorFromActionValue::eTraces [protected]
 
CAbstractVFunction* CActorFromActionValue::vFunction [protected]
 

The documentation for this class was generated from the following file: