Interpretation of Loss function in 'Train the Model' block.
Is it by convention that we divide by 2 and then by number of samples ?
If the objective is to provide better results in small iterations, why can't we divide it by l(et's say) 8,5,or 1000.?
1 answers ( 0 marked as helpful)
Therefore is a learning rate there.