The 365 Data Science team is proud to invite you to our own community forum. A very well built system to support your queries, questions and give the chance to show your knowledge and help others in their path of becoming Data Science specialists.
Anybody can ask a question
Anybody can answer
The best answers are voted up and moderated by our team

Why do we have privacy issues after using data masking technique on big data?

Why do we have privacy issues after using data masking technique on big data?


I was just wondering why do we still having data privacy issues all over the big data even though the data has been hiding and cannot be manipulated?

1 Answer

365 Team

Hi Eeshwar!
Thanks for reaching out.
The main reason for data privacy issues remain that sometimes the masking technique may not have worked so well and the hackers may have managed to obtain the real data. I.e. they may have managed to re-identify it. So, it’s basically a matter of how good a given masking technique is, in a certain situation, and whether it has worked well in that particular case. 
This can also be classified as a data protection issue.
Hope this helps.

Thanks martin for clarifying the doubt to some extent, I am still wondering what kind of masking techniques could be used in order to protect the data could you please elaborate this with a real-time example if possible,I am still curious to know more on this?

10 months

Hi Eeshwar! There are many techniques and data masking is a separate and huge field on its own. The most notable among these techniques, probably, are encryption (where the authorised users need a key to access this data), substitution (where you mimic the look of the authentic data but you actually provide unauthentic data, which is, say, data that is very close to the original one), averaging (where you impute average values in the place of the different values in a certain column), character scrambling (a basic technique which works great in some situations), shuffling (about which you can learn more for in our program), and more. When you say “real-time example” I guess you are referring to Dynamic Data Masking (DDM), which is a relatively new technology that builds on “on-the-fly” data masking. The latter is an ETL (Extract Transform Load) process where the source of information, which is to be masked, is being specified (environment 1), and then the location where the masked data will be loaded is also being specified (environment 2). The same process applies for DDM, however not for transferring an entire data set, but one record at a time. Hope this helps. Best,Martin

9 months
Complete Data Science Education
Get 50% OFF