Seminar Computational Intelligence D (708.114)

Research Topics in Reinforcement Learning

SS 2007

Institut für Grundlagen der Informationsverarbeitung (708)

O.Univ.-Prof. Dr. Wolfgang Maass

Office hours: by appointment (via e-mail)


Location: IGI-seminar room, Inffeldgasse 16b/I, 8010 Graz
Date: Tuesday, 16:15-18:15 p.m.
starting on March 6, 2007 (organization meeting; you can also reserve a talk by sending email to 

Content of the seminar:

The goal of the Computational Intelligence Seminar D will be to present the most important and promising new ideas in Reinforcement Learning, i.e. in that are of machine learning where agents learn to act without a supervisor who tells the agent  what to do at each step. Rather, the learning agent has to find out autonomously which sequences of movements turn out to be useful for reaching a goal. The policy which the agent learns could be a  winning strategy in a game, or a sequence of motor commands for a robot. We will focus on the latter task, more precisely on learning strategies for controling the humanoid robot HOAP 2 (see ), on which we are working in a joint research project with the EPFL in Lausanne (see
This is insofar quite innovative (and  urgently needed), since so far very little learning could be used for the control of humanoid robots (or other real-world robots, like the robots in the midsize league of the RoboCup, see

In addition we discuss a new research result that provides a method that appears to enable biological neural circuits to learn via reinforcement learning. Research on this topic provides new ideas for the emergence of "intelligence"  in biological organisms.

This seminar is intended for master students, who want to learn how to condense a research paper to the essential core, and give an understandable talk about this (talks will be 40 minutes long). In addition, this seminar provides opportunities for selecting a topic for a project or master thesis (suggestions for suitable topics are given below after some of the papers; marked with  * ).

We expect that most students who attend this seminar have previously taken the course Machine Learning B.  However in cases where this was not possible, a student can read on his/her own the introduction to reinforcement learning available from (see Course Material)

Those among the papers listed below that are not publicly available, can be found in our pdf-archive (students who are not working at our institute can get a copy from Angelika Zehetner


27.03.2007  Helmut Hauser
The humanoid robot HOAP 2 as real-world challenge for learning motor control
(If you want to have the whole presentation including the HOAP-2 videos (huge files) please contact directly. He would be happy to make the files for you via ftp accessible.)

18.05.2007  Gerhard Neumann
Graph-based reinforcement learning with local controllers
Presentation: PDF

22.05.2007  Georg Holzmann
Echo State Networks for Signal Processing
Presentation: PDF

22.05.2007  Roland Unterberger
Online Reservoir Adapting by Intrinsic Plasicity
Presentation: PDF

05.06.2007  Elmar Rückert
Control of humanoid robots with motion primitives (paper by Helmut Hauser)
Presentation: PDF
05.06.2007  Helmut Kreuzer
Planning and acting in uncertain environments using probabilistic inference by Verma and Rao
Presentation: PDF

14.06.2007  Gregor Hörzer and Andreas Töscher
Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling by Izhikevich
sowie neue theoretische Ergebnisse von Robert Legenstein
Presentations: PDF and PDF

Intrinsically motivated reinforcement learning

Intrinsically motivated RL tries to learn new skills without an external reward signal. In the skill acquisition phase the agent learns how to reach task-independent subgoals with increasing complexity. These skills can then be used to learn different tasks in the environment faster than with flat RL architectures.
Literature :
* This topic is also suitable for projects or subsequent master theses on the autonomous definition and exploration of subgoals, for example in a robotics context. For more information see our projects page .

Applications of ESNs (echo state networks) in reinforcement learning

Typically RL problems need to satisfy the Markov assumption. In many real-world problems however, the state of the system is not fully observable, thus we have to consider reinforcement learning algorithms for non-markovian tasks. Usually this is done by integrating observations from the past into the current state signal. One promising new approach is to use a special type of recurrent neural network - the Echo State Network (ESN) - in order to store the history of the system in the internal state of the network.
Literature :
* This topic is also suitable for projects or a subsequent master theses on an innovative use of artificial neural networks in robotics (possibly for the HOAP 2). For more information see our projects page .

Online reservoir adaptation by intrinsic plasticity for backpropagation-decorrelation and echo state learning *
Jochen J. Steil  on
and the slides from his talk on
* suitable for project or master thesis on an innovative use of artificial neural networks in robotics (possibly for the HOAP 2)

Reinforcement learning in biological organisms

In a biological organism, supervision signals are sparse. Therefore, reinforcement learning is regarded as a major candidate for learning in biological neural systems. The following paper provides a method that appears to enable biological neural circuits to learn via reinforcement learning.
Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling. *
by Izhikevich, published in Cerebral Cortex, 10.1093/cercor/bhl152
* suitable for a project or master thesis on reinforcement learning in models for biological neural systems

Additional talks can be organized for students who are interested in some other reinforcement learning related project under the supervision of You can find a list of available project and master thesis topics at our institute at

last update: 2007-05-30,