Reward-based stochastic self-configuration of neural circuits
Experimental data suggest that neural circuits configure their synaptic
connectivity for a given computational task. They also point to
dopamine-gated stochastic spine dynamics as an important underlying
mechanism, and they show that the stochastic component of synaptic plasticity
is surprisingly strong. We propose a model that elucidates how task-dependent
self-configuration of neural circuits can emerge through these mechanisms.
The Fokker-Planck equation allows us to relate local stochastic processes at
synapses to the stationary distribution of network configurations, and
thereby to computational properties of the network. This framework suggests a
new model for reward-gated network plasticity, where one replaces the common
policy gradient paradigm by continuously ongoing stochastic policy search
(sampling) from a posterior distribution of network configurations. This
posterior integrates priors that encode for example previously attained
knowledge and structural constraints. This model can explain the
experimentally found capability of neural circuits to con gure themselves for
a given task, and to compensate automatically for changes in the network or
task. We also show that experimental data on dopamine-modulated spine
dynamics can be modeled within this theoretical framework, and that a strong
stochastic component of synaptic plasticity is essential for its performance.
Reference: D. Kappel, R. Legenstein, S. Habenschuss, M. Hsieh, and
Reward-based stochastic self-configuration of neural circuits.