Statistics: Numerical Data Intervals
Hello, In the module Numerical Variables: Frequency Distribution tables you say that a number is included in the interval if X is strictly greater than lower bound and X is lower or equal to upper bound. However in the example of the video you mention the interval 1 to 21 has a frequency of 2 (1 and 9). Why is 1 included as it is equal (and not greater) to the lower bound? The same case in the exercice correction (2.4) : first interval is 8 to 54 and includes "8". Is it a mistake and that in really X should be greater or equal than lower bound and strictly lower than upper bound? Furthermore in vague memories from my high school statistics, I was taught to always start the 1st interval at 0 ( 0 to 20, 20 to 40 ect.) so this is wrong? the first interval should start with the lowest value of the dataset? Thank you for your answers
2 answers ( 0 marked as helpful)
Hi Lisa, The rules provided to determine the frequency of the values included in a single interval (bin) are true for all intervals except for the first one. The first one does include the lowest value in your dataset. Regarding your second question, you could start your intervals at 0. However, it is often much more practical to start with the lowest value in the dataset. This is because if the lowest value in the dataset happens to be much greater than 0, then you would end up with a lot of intervals with no values/frequencies, making those that do have values be graphed as much smaller to fit all the bins in the overall graph. Basically we care only for the bins with values contained in them. Please let me know if I made myself clear enough. Hope this helps!