The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Scale Loss and Deltas

Scale Loss and Deltas


I notice an interesting behavior, could you further explain why if the loss or errors are not scaled by the number of observations the model diverges?

1 Answer

365 Team

Hello, Madhu!
Happy to have you here!
Since the update rule depends on the learning rate and the dot product of the inputs and the deltas, the updates with respect to w are a function of (learning rate, inputs, deltas). 
Rescaling is usually done for faster/easier learning (or modularization).
1. We can rescale the inputs (and that’s something which we actually do later on in the course), but not in this lecture. 
2. We can change the learning rate. That’s something we play with in the exercises.
3. We can rescale the deltas (that’s what we do in the lecture).
Often we combine all three. In this course, though, we are showing you things one at a time. 
Rescaling in the case was done to optimize the choice of the learning rate and the choice of number of iterations.
This rescaling trick shows the “deltas per observation”. Thus the learning rate we will use for 10 or 10,000 points will be the same. That’s a very useful and important property.
In order to fully understand the issue, the best thing for you to do would be the following:
You already know the values of the weights and biases at which you are aiming (and in fact you can print them at each iteration to see how does the training go). You also have the value of the loss, so that’s another point of reference (the more fundamental one). So:
1. Don’t rescale the deltas. Leave them as they are. 
The easiest way to do that would be to change: 
deltas_scaled = deltas / observations 
deltas_scaled = deltas 
The rest of the code will then remain unchanged.
2. See if the algorithm converges.
3. Change the learning rate
4. Repeat 2 and 3 until you find a satisfactory result (you’ll know when).
5. If the algorithm converges, find how many iterations it needs to converge.
6. Repeat until you find some optimal values.
Then compare that whole experience with the solution in the lecture – rescaling the deltas. 
The best way to learn machine learning is to play around with the algorithm.
The 365 Team

Complete Data Science Education
Get 50% OFF