Im doing the course on Python + SQL + Tableau. Im from a Oracle background . Is the Pandas Dummies a way to group data i.e is it equivalent to SQL group by functions. So if this was to be done in Oracle, we would simple use a group by query and get the count for each group of reasons .
I just feel like the Python way is more complicated. Why would this be done in Python vs Database queries ? Would it be the performance of SQL for big data?
Thanks for reaching out!
In Python, particularly in pandas, you do have a .groupby() method that does relate to the GROUP BY function used in MySQL. Here’s a link to the Documentation about this method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
Regarding using dummies – that’s something different and it rather refers to the econometric technique of converting some of your regressors/independent variables (which, in our case, refer to columns of the DataFrame we use) into dummies – i.e. variables that can take one of only two values – of 0 or 1, to indicate the absence or presence of a certain phenomenon.
So, in this scenario, we use the same term “grouping”, for the lack of a better term, perhaps. What we actually do is group, or stack, or combine, certain reasons for absence into specific groups. But it’s not related to filtering your data in anyway – we are still on the opposite site, so to speak; we are organising/preprocessing the data to bring it to a format that will be suitable for analysis.
Hope this helps but please feel free to get back to us should you need further assistance.