next up previous
Next: On- and off-policy learning Up: MLB_Exercises_2008 Previous: RL theory II [3

RL game [3* P]

Consider the following game: You have a random number generator that produces in every round an integer number from $ 1$ to $ 3$ with equal probability. You play 3 rounds and have to decide at which position of a 3 digit number you want to place the random digit. Your goal is to form the largest possible (decimal) number. Formulate this game as a Markov decision process and find an optimal policy. Also analyze the case where the numbers are drawn without replacement, i.e. if the digit $ 3$ appears in the first round, it cannot appear anymore in the remaining two rounds.



Haeusler Stefan 2009-01-19