The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Python Pandas

Python Pandas


Im doing the course on Python + SQL + Tableau. Im from a Oracle background . Is the Pandas Dummies a way to group data i.e is it equivalent to SQL group by functions. So if this was to be done in Oracle, we would simple use a group by query and get the count for each group of reasons .
I just feel like the Python way is more complicated. Why would this be done in Python vs Database queries ? Would it be the performance of SQL for big data?

1 Answer

365 Team

Hi Satya!
Thanks for reaching out!
In Python, particularly in pandas, you do have a .groupby() method that does relate to the GROUP BY function used in MySQL. Here’s a link to the Documentation about this method:
Regarding using dummies – that’s something different and it rather refers to the econometric technique of converting some of your regressors/independent variables (which, in our case, refer to columns of the DataFrame we use) into dummies – i.e. variables that can take one of only two values – of 0 or 1, to indicate the absence or presence of a certain phenomenon.
So, in this scenario, we use the same term “grouping”, for the lack of a better term, perhaps. What we actually do is group, or stack, or combine, certain reasons for absence into specific groups. But it’s not related to filtering your data in anyway – we are still on the opposite site, so to speak; we are organising/preprocessing the data to bring it to a format that will be suitable for analysis.
Hope this helps but please feel free to get back to us should you need further assistance.

Complete Data Science Education
Get 50% OFF