Next: Policy Gradient Methods: Swimmer Up: MLB_Exercises_2012 Previous: RL application: On- and

RL game [3* P]

Consider the following game: You have a random number generator that produces in every round an integer number from to with equal probability. You play 3 rounds and have to decide at which position of a 3 digit number you want to place the random digit. Your goal is to form the largest possible (decimal) number. Formulate this game as a Markov decision process and find an optimal policy. Also analyze the case where the numbers are drawn without replacement, i.e. if the digit appears in the first round, it cannot appear anymore in the remaining two rounds.

Haeusler Stefan 2013-01-16