Resolved: Regression on 1.04. Real-life example.csv
I wanted to perform regression on 1.04. Real-life example.csv file without removing the models column. So I created dummy for all variables after removing the empty values. Before regression I performed F Regression to find out which variables are useful. I found out that for 25% of dummy variables of model variable the P value was less than 0.05 indicating that these are significant for regression however for other 75% dummy variables of model P value was greater than 0.05. So does this indicate that Model field in not essential for regression ?
1 answers ( 0 marked as helpful)
Hi Adit,
If a dummy variable for Model is insignificant this does not mean that the dummy is useless.
Say that one such dummy is Audi A3. Let's also assume you have dropped the dummy for Mercedes A-Class. Therefore, Mercedes A-class is the baseline category. All other dummies are compared to it.
If the dummy for Audi A3 is significant it means that it is distinctively different from a Mercedes A-Class when it comes to predicting an outcome using the model.
If the dummy for Audi A3 is insignificant, it means that it has practically the same effect as the baseline dummy (Mercedes A-Class).
What this means for you model is that 25% of the dummies are practically the same as the baseline dummy in terms of effect on the outcome. The rest of the dummies, however, are different. Therefore, all dummies should stay in the model.
***
Note that if all dummies (no exceptions) are insignificant, this means that the whole variable is insignificant.
If even one dummy is significant, all dummies should remain in the model.
Best,
Iliya