Should the scaling method and the activation function of the output layer be compatible?
For instance, when I scale the data using (x-xmin)/(xmax-xmin), should I necessarily use sigmoid in the output layer? so that both the target and the output have same range [0,1] hence, they can be compared. Does that make sense?
What if I use normal standarization [(x-mean)/std], should I necessarily use a specific activation function?
Thank you in advance.
For neural networks that is not a prerequisite. Therefore, the two need not have the same natural domain.