Reward-modulated Hebbian Learning of Decision Making

M. Pfeiffer, B. Nessler, R. Douglas, and W. Maass


We introduce a framework for decision making in which the learning of decision making is reduced to its simplest andbiologically most plausible form: Hebbian learning on a linear neuron. We cast our Bayesian-Hebb learning rule as reinforcement learning in which certain decisions are rewarded and prove that each synaptic weight will on average converge exponentially fast to the log-odd of receiving a reward when its pre- and postsynaptic neurons are active. In our simple architecture, a particular action is selected from the set of candidate actions by a winner-takeall operation. The global reward assigned to this action then modulates the update of each synapse. Apart from this global reward signal, our reward-modulated Bayesian Hebb rule is a pure Hebb update that depends only on the coactivation of the pre- and postsynaptic neurons, not on theweighted sum of all presynaptic inputs to the postsynaptic neuron as in the perceptron learning rule or the Rescorla-Wagner rule. This simple approach to action-selection learning requires that information about sensory inputs be presented to the Bayesian decision stage in a suitably preprocessed form resulting from other adaptive processes (acting on a larger timescale) that detect salient dependencies among input features. Hence our proposed framework for fast learning of decisions also provides interesting new hypotheses regarding neural nodes and computational goals of cortical areas that provide input to the final decision stage.

Reference: M. Pfeiffer, B. Nessler, R. Douglas, and W. Maass. Reward-modulated Hebbian Learning of Decision Making. Neural Computation, 22:1399-1444, 2010.