Interpretation of Loss function in 'Train the Model' block.
Is it by convention that we divide by 2 and then by number of samples ?
If the objective is to provide better results in small iterations, why can't we divide it by l(et's say) 8,5,or 1000.?
Submit an answer
L1 vs L2 code difference, need details on code difference. How loss value impacts the weights and bi