If cumulative frequency wasn't a provided column, how would we have achieved obtaining those values?
Hi, as the title states, my question is "If cumulative frequency wasn't a provided column, how would we have achieved obtaining those values?"
For context, I was a bit confused at first with how the homework asks that bars are displayed in decreasing order of frequency:
"The specificity comes from the fact that the bars on the Pareto bar chart are displayed in decreasing order of frequency."
"The line chart on the other hand always shows the cumulative frequency on a Pareto."
I was confused by the word choice for the ordering of bars in order of frequency given that a "frequency" column exists in the data, so I originally sorted the pandas dataframe as follows:
df_complaints=df_complaints.sort_values(by='frequency',ascending=False)
df_complaints
However, when double-checking my code with the solution, I realized that it appears that the bar chart should be ordered in decreasing value of the "Number of complaints" column, and not by the "frequency" column, which seems to be the cumulative frequency--so I should have ordered with "ascending=True". (Side note: this ordering step wasn't in the solution since the data was already in descending "Number of complaints" order but I figured it'd be best to include this sorting values line just as practice~ :) )
Additionally, in the homework assignment it indicates:
"Cumulative frequency means the sum of frequencies occurring up to this point. So, for the first element, it will show the frequency of occurrence for the first element only. For the second element it will be the sum of frequency for the first and second element, and so on. The last cumulative frequency is 100% as it is the sum of all individual frequencies."
It wasn't immediately obvious to me from the homework description that the "frequency" column showed cumulative frequency already, and that we didn't need to sum it up with each element, until I had double-checked with the solution.
However, if it were the case that we weren't provided a column with cumulative frequency, how would we have obtained that cumulative frequency? Would we have had to sum up all of the "Number of complaints" and divide each individual by the total, and create a new df column to hold the cumulative value?
Thanks for your help! Let me know if anything isn't clear with my question.