The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Ask
Anybody can ask a question
Answer
Anybody can answer
Vote
The best answers are voted up and moderated by our team

Scaling of deltas

Scaling of deltas

0
Votes
1
Answer

Hello,
I noticed on the notebook attached with this lecture a trick you indicated for.
‘another small trick is to scale the deltas the same way as the loss function’. loss = np.sum(deltas ** 2) / 2 / observations
I can’t understand this one because if we do so, the deltas_scaled will always have a positive value, when we come to update the weights ‘weights = weights – learning_rate * np.dot(inputs.T,deltas_scaled)’ , the new weights will always decrease regardless of whether the output is greater or smaller than the target. I am really stuck here.
I hope you got my point.
Thank you in advance
Kind Regards.

1 Answer

Top Answer

365 Team
0
Votes

Hi Hady,
You are right to point out that scaling the deltas should (in theory) cause the algorithm to converge. But..
Since the update rule depends on the learning rate and the dot product of the inputs and the deltas, the updates with respect to w are a function of (learning rate, inputs, deltas).
Rescaling is usually done for faster/easier learning (or modularization).
1. We can rescale the inputs (and that’s something which we actually do later on in the course), but not in this lecture.
2. We can change the learning rate. That’s something we play with in the exercises.
3. We can rescale the deltas (that’s what we do in the lecture).
Often we combine all three. In this course, though, we are showing you things one at a time.
Rescaling in the case was done to optimize the choice of the learning rate and the choice of number of iterations.
This rescaling trick shows the “deltas per observation”. Thus the learning rate we will use for 10 or 10,000 points will be the same. That’s a very useful and important property.
In order to fully understand the issue, the best thing for you to do would be the following:
You already know the values of the weights and biases at which you are aiming (and in fact you can print them at each iteration to see how does the training go). You also have the value of the loss, so that’s another point of reference (the more fundamental one). So:
1. Don’t rescale the deltas. Leave them as they are.
The easiest way to do that would be to change:
deltas_scaled = deltas / observations
to
deltas_scaled = deltas
The rest of the code will then remain unchanged.
2. See if the algorithm converges.
3. Change the learning rate
4. Repeat 2 and 3 until you find a satisfactory result (you’ll know when).
5. If the algorithm converges, find how many iterations it needs to converge.
6. Repeat until you find some optimal values.
Then compare that whole experience with the solution in the lecture – rescaling the deltas.
The best way to learn machine learning is to play around with the algorithm.
Please give me feedback on how that went. If you get stuck, don’t hesitate to continue this discussion 🙂
Best,
The 365 Team