Resolved: Scaling of deltas

Question

Hello,
I noticed on the notebook attached with this lecture a trick you indicated for.
'another small trick is to scale the deltas the same way as the loss function'. loss = np.sum(deltas ** 2) / 2 / observations
I can't understand this one because if we do so, the deltas_scaled will always have a positive value, when we come to update the weights 'weights = weights - learning_rate * np.dot(inputs.T,deltas_scaled)' , the new weights will always decrease regardless of whether the output is greater or smaller than the target. I am really stuck here.
I hope you got my point.
Thank you in advance
Kind Regards.

Answer 1

Hi Hady, You are right to point out that scaling the deltas should (in theory) cause the algorithm to converge. But.. Since the update rule depends on the learning rate and the dot product of the inputs and the deltas, the updates with respect to w are a function of (learning rate, inputs, deltas). Rescaling is usually done for faster/easier learning (or modularization). 1. We can rescale the inputs (and that's something which we actually do later on in the course), but not in this lecture.
2. We can change the learning rate. That's something we play with in the exercises.
3. We can rescale the deltas (that's what we do in the lecture). Often we combine all three. In this course, though, we are showing you things one at a time. Rescaling in the case was done to optimize the choice of the learning rate and the choice of number of iterations. This rescaling trick shows the "deltas per observation". Thus the learning rate we will use for 10 or 10,000 points will be the same. That's a very useful and important property. In order to fully understand the issue, the best thing for you to do would be the following: You already know the values of the weights and biases at which you are aiming (and in fact you can print them at each iteration to see how does the training go). You also have the value of the loss, so that's another point of reference (the more fundamental one). So: 1. Don't rescale the deltas. Leave them as they are. The easiest way to do that would be to change: deltas_scaled = deltas / observations to deltas_scaled = deltas The rest of the code will then remain unchanged. 2. See if the algorithm converges.
3. Change the learning rate
4. Repeat 2 and 3 until you find a satisfactory result (you'll know when).
5. If the algorithm converges, find how many iterations it needs to converge.
6. Repeat until you find some optimal values. Then compare that whole experience with the solution in the lecture - rescaling the deltas. The best way to learn machine learning is to play around with the algorithm. Please give me feedback on how that went. If you get stuck, don't hesitate to continue this discussion :) Best,
The 365 Team

Resolved: Scaling of deltas

Submit an answer