CActorFromQFunctionAndPolicy Class Reference
Actor which uses a
QFunction and his Policy for the update.
More...
#include <cactorcritic.h>
Inheritance diagram for CActorFromQFunctionAndPolicy:
List of
all members.
Detailed Description
Actor which uses a QFunction and his Policy for the update.
The only difference to CActorFromQFunction is the
update of the Q-Function. The update is Q(s_t,a_t)_new =
Q(s_t,a_t)_old + beta * td * (1 - pi_(s_t, a_t)), where pi(s_t,
a_t) is the softmax-policy from the actor. This method is
recommended by Sutton and Barto.
Constructor & Destructor Documentation
| |
Creates the actor object, the policy has to choose the actions
using the specified Q-Function.
|
| virtual
CActorFromQFunctionAndPolicy::~CActorFromQFunctionAndPolicy
|
( |
|
) |
[virtual] |
|
Member Function Documentation
| |
Updates the Q-Function.
Does the following update: Q(s_t,a_t)_new = Q(s_t,a_t)_old +
beta * td * (1 - pi(s_t, a_t))
Reimplemented from CActorFromQFunction.
|
Member Data Documentation
The documentation for this class was generated from the following
file:
|