In what sense we spread weights?
Good morning. The Xavier initialization suggests to draw the weights in the interval [-x,x], where x decreases when the number of outputs increases. So in what sense " the higher the number of outputs, the higher the need to spread weights"? If the number of outputs is very high, the weights will be all around 0, and then we have the problem of the linearity of the sigmoid function around 0.