Is gender really binary if there is an N/A option?
1 answers ( 0 marked as helpful)
Hi Chitra,
This is a very interesting question, so thanks for that.
In general, N/A = not available / not applicable.
When analyzing your data in terms of gender there are 2 approaches:
1) disregard all observations where you have N/A for gender
2) consider 3 categories: male, female, other
Depending on the purposes of your analysis, you can consider it binary or non-binary.
Now, in many cases, gender is a significant predictor. For instance:
1) Expenditure on cosmetics -> females tend to spend more
2) Fishing equipment -> males tend to spend much more Firms are irrelevant in that case as they are unlikely to spend on either (so a situation similar to the one in the example). The recent global discussion on gender seems to only complicate this idea further. Data science seems a bit 'rough' as it does not discriminate between all the possibilities. However, what we usually have is 2 major 'types of expenditure', or 'behavior', when analyzing gender. If 99% of your data is behaving like a male or a female, the rest 1% will not give you much information about the general trend. In fact even if 90% of your data is following the general trend, you'd still stick to that, without complicating your model further. We have basically employed this idea for our example with the N/A. Best, 365 Team
1) Expenditure on cosmetics -> females tend to spend more
2) Fishing equipment -> males tend to spend much more Firms are irrelevant in that case as they are unlikely to spend on either (so a situation similar to the one in the example). The recent global discussion on gender seems to only complicate this idea further. Data science seems a bit 'rough' as it does not discriminate between all the possibilities. However, what we usually have is 2 major 'types of expenditure', or 'behavior', when analyzing gender. If 99% of your data is behaving like a male or a female, the rest 1% will not give you much information about the general trend. In fact even if 90% of your data is following the general trend, you'd still stick to that, without complicating your model further. We have basically employed this idea for our example with the N/A. Best, 365 Team