Suppose you had a neural network with only linear activation
function, one hidden and one output layer. Assume the is the weight matrix from the
input to the hidden layer and is the weight vector from the hidden to the
output layer, both layers do not use any bias terms. Write down the
equation of the output value as function of the input vector
and the weights of the
network without explicitly mention the output of the hidden layer.
Proof or disproof that this 2 layered neural network can be
represented by a single linear unit. What can be said about with an
arbitrary number of hidden layers with linear activation functions?

Consider a 2 layer feedforward neural network with 2 hidden units,
2 inputs and 1 linear output unit. The activation function of the
hidden units is given by:
where is an arbitrary
sigmoidal activation function. Derive a gradient descent learning
rule for the weights from the input to the hidden layer and the
weights from the hidden to the output layer which minimizes the
mean squared error (mse) of a single example
.