Resolved: 2:22 clarification
At 2:22, it is mentioned "a static output of the activations minimises the gradient, while the algo is not really trained"
I do not quite understand what the above sentence means. Is it because at extremely large or small weight values, the learning rate will not be able to influence changes in the output (because the gradient is minimised) because small changes in weights do not cause any noticeable decrease of the loss, and hence algo is not able to learn how to minimise the loss?
Thanks
1 answers ( 1 marked as helpful)
Hi Ryan,
The phrase "a static output of the activations minimizes the gradient, while the algorithm is not really trained" suggests that when the activations are "static" or stuck, the gradient becomes very small or approaches zero. So your intuition is in the right direction.
If the activations become static due to extreme weight values, the gradient decreases (the so-called vanishing gradient problem), preventing the algorithm from properly updating the weights and hence learning how to minimize the loss.
Best,
Ned