next up previous
Next: RL game [3* P] Up: MLB_Exercises_2008 Previous: RL theory I [3

RL theory II [3 P]

Prove Corollary 1.3 (p. 9) from the script Theory of Reinforcement Learning 7:

Every policy $ \pi$ for which $ V^{\pi}$ satisfies the Bellman optimality equations

$\displaystyle V^{\pi}(s) = \max_{a \in A_s} Q^{\pi}(s, a) \forall s \in S$

is optimal.



Haeusler Stefan 2009-01-19