Types of data.
We can classify data in two main ways – based on its type and on its measurement level.
Let’s start from the types of data we can have. There is categorical and numerical data.
Categorical data describes categories or groups. For example, car brands like Mercedes, BMW and Audi – they show different categories. Another instance is answers to yes and no questions. If I ask questions like:
Are you currently enrolled in a university?
Do you own a car?
Yes and no would be the two groups of answers that can be obtained.
This is categorical data.
Numerical data, on the other hand, as its name suggests, represents numbers. It is further divided into two subsets: discrete and continuous.
Discrete data can usually be counted in a finite matter. A good example would be the number of children that you want to have. Even if you don’t know exactly how many, you are absolutely sure that the value will be an integer such as 0, 1, 2, or even 10.
Another instance is grades on the SAT exam. You may get 1000, 1560, 1570 or 2400. What is important for a variable to be defined as discrete is that you can imagine each member of the dataset. Knowing that SAT scores range from 600 to 2400 and 10 points separate all possible scores that can be obtained is key.
It’s easier to understand discrete data by saying it’s the opposite of continuous data. Continuous data is infinite and impossible to count. For instance, your weight can take on every value in some range. Let’s dig a bit deeper into this. You get on the scale and the screen shows 150 pounds, or 68.0389 kilograms. But this is just an approximation. If you gain 0.01 pound, the figure on the scale is unlikely to change, but your new weight will be 150.01 pounds or 68.0434 kilograms. Now think about sweating. Every drop of sweat reduces your weight by the weight of that drop, but a scale is unlikely to capture that change. The process of losing and gaining weight occurs all the time. Your exact weight is a continuous variable – it can take on an infinite amount of values no matter how many digits there are after the dot.
To sum up, your weight can vary by incomprehensibly small amounts and is continuous, while the number of children you want have is directly understandable and is discrete.
Curious to learn more? Check out our online Data Science Training.
Next Video: INDEX and MATCH