Reinforcement Learning Toolbox 2.0
last updated:
General
Documentation
Manual
Tutorial
Class Reference
Master Thesis
Examples
Related Papers
Downloads
Links
News
mailto:webmaster
Main Page     Class Hierarchy   Compound List   File List   Compound Members   File Members

CActorFromQFunctionAndPolicy Class Reference

Actor which uses a QFunction and his Policy for the update. More...

#include <cactorcritic.h>

Inheritance diagram for CActorFromQFunctionAndPolicy:

CActorFromQFunction CActor CSemiMDPListener CErrorListener CParameterObject CParameterObject CParameters CParameters List of all members.


Public Member Functions

  CActorFromQFunctionAndPolicy (CAbstractQFunction *qFunction, CStochasticPolicy *policy)
  Creates the actor object, the policy has to choose the actions using the specified Q-Function.

virtual  ~CActorFromQFunctionAndPolicy ()
virtual void  receiveError (double critic, CStateCollection *state, CAction *Action, CActionData *data=NULL)
  Updates the Q-Function.

CStochasticPolicy getPolicy ()


Protected Attributes

CStochasticPolicy policy
double *  actionValues

Detailed Description

Actor which uses a QFunction and his Policy for the update.

The only difference to CActorFromQFunction is the update of the Q-Function. The update is Q(s_t,a_t)_new = Q(s_t,a_t)_old + beta * td * (1 - pi_(s_t, a_t)), where pi(s_t, a_t) is the softmax-policy from the actor. This method is recommended by Sutton and Barto.


Constructor & Destructor Documentation

CActorFromQFunctionAndPolicy::CActorFromQFunctionAndPolicy CAbstractQFunction qFunction,
CStochasticPolicy policy
 

Creates the actor object, the policy has to choose the actions using the specified Q-Function.

virtual CActorFromQFunctionAndPolicy::~CActorFromQFunctionAndPolicy  )  [virtual]
 

Member Function Documentation

CStochasticPolicy* CActorFromQFunctionAndPolicy::getPolicy  ) 
 
virtual void CActorFromQFunctionAndPolicy::receiveError double  critic,
CStateCollection state,
CAction Action,
CActionData data = NULL
[virtual]
 

Updates the Q-Function.

Does the following update: Q(s_t,a_t)_new = Q(s_t,a_t)_old + beta * td * (1 - pi(s_t, a_t))

Reimplemented from CActorFromQFunction.


Member Data Documentation

double* CActorFromQFunctionAndPolicy::actionValues [protected]
 
CStochasticPolicy* CActorFromQFunctionAndPolicy::policy [protected]
 

The documentation for this class was generated from the following file: