RL theory I [3 P]

Prove Corollary 1.1 (p. 7) from the script Theory of Reinforcement Learning 6:

For every policy $ \pi$ there exists a deterministic policy $ \pi'$ such that $ \pi' \geq \pi$ . As a special case: If there exists a stochastic optimal policy $ \pi$ , then there exists also a deterministic optimal policy $ \pi'$ such that $ \pi' \geq \pi$ .

Haeusler Stefan 2009-01-19