Reward-modulated Hebbian Learning of Decision Making
M. Pfeiffer, B. Nessler, R. Douglas, and W. Maass
Abstract:
We introduce a framework for decision making in which the learning of decision
making is reduced to its simplest and biologically most plausible form:
Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule
as reinforcement learning in which certain decisions are rewarded, and prove
that each synaptic weight will on average converge exponentially fast to the
log-odd of receiving a reward when its pre- and post-synaptic neurons are
active. In our simple architecture, a particular action is selected from the
set of candidate actions by a winner-take-all operation. The global reward
assigned to this action then modulates the update of each synapse. Apart from
this global reward signal our reward-modulated Bayesian Hebb rule is a pure
Hebb update that depends only on the co-activation of the pre- and
postsynaptic neurons, and not on the weighted sum of all presynaptic inputs
to the post-synaptic neuron as in the perceptron learning rule or the
Rescorla-Wagner rule. This simple approach to action-selection learning
requires that information about sensory inputs be presented to the Bayesian
decision stage in a suitably pre-processed form resulting from other adaptive
processes (acting on a larger time scale) that detect salient dependencies
among input features. Hence our proposed framework for fast learning of
decisions also provides interesting new hypotheses regarding neural nodes and
computational goals of cortical areas that provide input to the final
decision stage.
Reference: M. Pfeiffer, B. Nessler, R. Douglas, and W. Maass.
Reward-modulated Hebbian Learning of Decision Making.
Neural Computation, 2009.
in press.