CSarsaLearner Class Reference
Class for Sarsa Learning.
More...
#include <ctdlearner.h>
Inheritance diagram for CSarsaLearner:
List of all members.
Detailed Description
Class for Sarsa Learning.
The other possibility for choosing the action a_{t+1} is to
choose always the action which is doublely executed by the agent.
This Method is called SARSA learning (you have a
(S)tate-(A)ction-(R)eward-(S)tate-(A)ction tuple for update). This
method learns the policy of the agent directly. Which method (Q or
Sarsa Learning) works better depends on the learning problem,
generally SARSA learning is more save if you have some states with
high negative reward, since SARSA learning takes the exploration
policy of the agent into account.
Since the sarsa algorithm needs to know what the agent will do
in the next step, it gets a pointer to the agent. The agent serves
as deterministic controller, saving the action coming from his
controller. The learner can use the agen't getNextAction method to
get the next extimated action. The advantage that the estimation
policy is the policy of the agent is that the ETraces of the Sarsa
Learner only have to be reset when a new Episode begins. This can
lead to better performance as the Q-Learning Algorithm.
The Sarsa learner supposes a deterministic controller as
estimation policy, which is usually the agent or a hierarchic
MDP.
Constructor & Destructor Documentation
|
CSarsaLearner::~CSarsaLearner
|
( |
|
) |
|
|
The documentation for this class was generated from the following
file:
|