Resolved: Course Exam - Number 4
Please help explain what was done in Question 4 of the Course exam more clearly:
How did we come about the 2.8 for the tuning parameter?
Hi Jonathan!
Thanks for reaching out!
In Question 4, your assignment is to conduct cross-validation employing the RepeatedKFold
validator, which is defined by three parameters: n_splits
, n_repeats
, and random_state
. You are already provided with the following values: n_repeats=3
and random_state=1
. n_splits
is unknown.
The objective is to experiment with the given options for n_splits
to identify the one that yields a tuning parameter of 2.8 afterward.
After initializing the cross-validator, you are expected to create the Ridge regression model and fit it to the training data. This way, you'll be able to reveal alpha.
n_splits=3
(meaning that the dataset will be divided into 3 folds) leads to the expected result:
cv = RepeatedKFold(n_splits=3, n_repeats=3, random_state=1)
ridge_test = RidgeCV(alphas=np.arange(0.1, 10, 0.1), cv=cv)
ridge_test.fit(x_train,y_train)
print("Ridge tuning parameter:", (ridge_test.alpha_))
Hope this helps.
Best,
Ivan
Hello Ivan,
See what my output is like:
What could I be getting wrong?
Hi Jonathan!
Here's how it looks on my end:
My suggestion is to check if you are doing the same splitting when dividing the data into train and test sets earlier. More precisely, this part:
X = df_nonull.drop('Price', axis = 1)
y = df_nonull['Price']
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_test)
The values are expected to remain the same as they are in the initial downloadable notebook. Changing the test size or the random state values will result in different outcomes.
Hope this helps.
Best,
Ivan
Thank you for the help. We always need that second eye.
I was making two errors here:
1. My target variable was now set to "Year of sale" to address the latter part of the quiz and I was running the code based on the "Year of sale" as target variable.
2. My code read (past tense):
ridge = RidgeCV(alphas=(0.1, 10, 0.1), cv=cv)
and I just realized the argument for my alphas was slightly incorrect and should have been:
...alphas=np.arange(0.1, 10, 0.1)...
Thank you for helping to spot my error once again.
It's great that you spotted the errors and finalized the code. Good job on working so thoroughly on the task!
Feel free to post another question should you encounter other difficulties!
Best,
Ivan