R. Legenstein, N. Wilbert, and L. Wiskott
Humans and animals are able to learn complex behaviors based on a massive
stream of sensory information from different modalities. Early animal studies
have identified learning mechanisms that are based on reward and punishment
such that animals tend to avoid actions that lead to punishment whereas
rewarded actions are reinforced. However, most algorithms for reward-based
learning are only applicable if the dimensionality of the state-space is
sufficiently small or its structure is sufficiently simple. Therefore, the
question arises how the problem of learning on high-dimensional data is
solved in the brain. In this article we propose a biologically plausible
generic two-stage learning system that can directly be applied to raw
high-dimensional input streams. The system is composed of a hierarchical slow
feature analysis (SFA) network for preprocessing and a simple neural network
on top that is trained based on rewards. We demonstrate by computer
simulations that this generic architecture is able to learn quite demanding
reinforcement learning tasks on high-dimensional visual input streams in a
time that is comparable to the time needed when an explicit highly
informative low-dimensional state-space representation is given instead of
the high-dimensional visual input. The learning speed of the proposed
architecture in a task similar to the Morris water maze task is comparable to
that found in experimental studies with rats. This study thus supports the
hypothesis that slowness learning is one important unsupervised learning
principle utilized in the brain to form efficient state representations for
behavioral learning.