Python Pandas

Question

Im doing the course on Python + SQL + Tableau. Im from a Oracle background . Is the Pandas Dummies a way to group data i.e is it equivalent to SQL group by functions. So if this was to be done in Oracle, we would simple use a group by query and get the count for each group of reasons . I just feel like the Python way is more complicated. Why would this be done in Python vs Database queries ? Would it be the performance of SQL for big data?

Answer 1

Hi Satya!
Thanks for reaching out!
In Python, particularly in pandas, you do have a .groupby() method that does relate to the GROUP BY function used in MySQL. Here's a link to the Documentation about this method: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
Regarding using dummies - that's something different and it rather refers to the econometric technique of converting some of your regressors/independent variables (which, in our case, refer to columns of the DataFrame we use) into dummies - i.e. variables that can take one of only two values - of 0 or 1, to indicate the absence or presence of a certain phenomenon.
So, in this scenario, we use the same term "grouping", for the lack of a better term, perhaps. What we actually do is group, or stack, or combine, certain reasons for absence into specific groups. But it's not related to filtering your data in anyway - we are still on the opposite site, so to speak; we are organising/preprocessing the data to bring it to a format that will be suitable for analysis.
Hope this helps but please feel free to get back to us should you need further assistance.
Best,
Martin

Python Pandas

Submit an answer