First of all I would like to thank you for your robust content that really helped me.
Secondly, Concerning the business case example, after running the code, I tried getting the weights for the first hidden layer, I know it’s very big tensor [10*50] ,but the problem is each time I run the code I get a completely different bunch of weights. At the beginning, I thought that weights should converge over iterations like the minimal example till they are held constant which means there is only one possible array of weights for each example, but I guess this is not the case for non linear models because it seems there are infinite number of different weights matrix that can actually lead to a decent test accuracy. Am I right?. Should I focus on validation and test accuracy regardless of the weights obtained? Should different people get different weights for same case and they are all acceptable?
I am asking this because I spent a lot of time trying to obtain same weight matrix as some online research paper, but I couldn’t.
So sorry for the inconvenience and thank you in advance.
It is completely normal to experience this.
The main reason is that each time you run the code, all weights are randomly initialized. Moreover, the hidden layers are NOT fixed values. So the hidden layer itself will be varying every time you run the code. Overall, they converge to a mathematically identical outcome (yet not following the exact same path).
For simpler models like linear regression (NN with no hidden layers) we have a deterministic mathematical solution, which is easy to achieve with no gradient descent, thus usually each new run of the model yields the same result.
Hope this helps!