Execution of cells in ipynb file do not match the video (recent updates maybe?)
The get_dummies method gave me boolean values instead of 1s and 0s. I changed that by adding the parameter...
dtype = 'int'
... to the method call.
Also, the heatmap requires the corr() function to be applied on 'df_hitters_num_nonull', not on 'df_hitters' as it is in the downloadable course resources
Hi Alastair!
Great to have you in this course and thanks for reaching out!
Indeed, recent updates in libraries like pandas can affect how functions operate. For example, pd.get_dummies()
now returns boolean data types by default. You can get the required behaviour exactly as you suggested - by setting pd.get_dummies(dtype=int):
df_hitters_num = pd.get_dummies(df_hitters, columns = ['League', 'Division', 'NewLeague'], drop_first=True, dtype=int)
Seaborn's functions could also cause similar troubles. Alternative approaches could be applied to prevent such errors from occurring.
For instance, instead of using sns.displot(df_hitters_num_nonull['Salary'])
to check the distribution, we can rely on matplot lib and the hist() function: plt.hist(df_hitters_num_nonull['Salary'])
.
In the course, we demonstrate essential techniques that underpin the analysis but these can be executed in various ways. While we present specific methods we deemed suitable when creating the course, there are alternative approaches available, and students are encouraged to explore and use different techniques.
Thank you for spotting these issues and finding a suitable solution. We highly appreciate your feedback.
Best,
Ivan