Can't figure out the outcome of 70 percentile
Hi eveyone,
I would like to know the caculation behind the number 4.7999... shown in this lecture.
Why 4.7999... is the number of 10.5th element rather 4.5 or 4.7?
Any formulas behind?
Thank you!
Hi Jalego, here is a common formula for finding the cut-off points:
i = ((K/n) * (d - 1)) + 1
Video example
i = ((70/100) * (15 - 1)) + 1
i = (0.7 * 14) + 1
i = (9.8 + 1)
i = 10.8
where i is the index where the cut-off point is supposed to be.
K is the percentile to be found.
n is the total number of percentiles. (How many parts are the data divided into? If it is 100, then it refers to the percentile, 4 then it refers to the quartile).
d the total number of elements in the data set.
If i is an integer, it is sufficient to locate this index in the list of values and the value of this position is the percentile. Otherwise, round up and down to find the nearest integer values, these values will be the indexes where the values will be for further calculation.
list_of_values = [0, 0, 0, 1, 1, 2, 3, 3, 3, *4*, *5*, 6, 6, 8, 9])
We obtain i = 10.8 in the video example, so the closest index downward is 10 and the closest index upward is 11, these indexes in our list of values are: i = 10: 4; i = 11: 5.
The next step is to calculate the difference between the two values (5 - 4) and multiply by the decimal part of the index calculated above and finally add to the lower value:
5 - 4 = 1 (difference between the lowest and highest index value)
0.8 (the decimal part of the index calculated above (10.8))
1 * 0.8 = 0.8 (product between the decimal part and the difference)
4 + 0.8 = 4.8 (sum of the above result to the lowest index value)
It seems a little difficult but it is enough to follow this formula or use a programming language or another tool :).
Best Regards,
Are there any advantages in using np.percentile() over np.quantile() or viceversa? Or is it just a matter of preference?