Big DataData ProtectionPrivacy

Sprinklr Case: Need for anonymization of Health Data

In the digital age, patient data must be handled with much care. Often without consent or warning, and sometimes in completely surprising ways, big data analysts are tracking our every click and purchase, examining them to determine exactly who we are.[12] The expanse of ICT has also bettered the production of Big Data – digital metadata like records of social activity, location, footprints, etc. Google is one of the largest keepers of big data, and also provides an open-access interface (Google Insights) to enable analysis.[13]

Researchers are increasingly making use of artificial intelligence aided platforms to track the spread of coronavirus, as well as to understand the disease in a more deep, comprehensive way[14] aiding them to identify possible cures. Even before coronavirus, there was a practice of sharing patient data among research groups, but not before it had undergone de-identification or pseudonymization. Data can be identified by two broad sets of details – the direct identifiers and the indirect identifiers (often called the quasi-identifiers). Direct identifiers include name, address, contact details, etc., while the quasi ones are the finer data points like clinics/ hospitals visited, tests underwent, prescriptions are taken, etc. Once the data is striped off these identifiers, it is called de-identified or anonymized data. Most jurisdictions, including EU GDPR, do not consider anonymized data as personal information and hence, they are out of the scope of the multiple data protection regulations.

In Europe, we have the European Data Protection Directive 95/46/EC which regulates the processing of personal data within the European Union. Article 26 reads:

“(26) Whereas the principles of protection must apply to any information concerning an identified or identifiable person; … whereas the principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable[15]; …”

Similarly, the United States has the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule of 1996. HIPAA states that

“Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual is not individually identifiable health information.”[16]

Presently, India has an outdated Information Technology (Reasonable Security Practices and Procedures and Sensitive Personal Data or Information) Rules of 2011 in force. Sadly, the IT Rules do not specify on how corporations should deal with anonymized data. Even the Personal Data Protection Bill, 2019 currently under scrutiny by the Joint Parliamentary Committee does not mandate either the data fiduciaries or the processor to anonymize any data that they receive. To put facts plainly – the data that would be collected by fiduciaries would remain labeled; making pin-pointing individuals at any point an easy task, probably just a search away.

There exist multiple[17] ways to anonymize data and the way each set is dealt depends on how it will be shared, for example, if the data will be made publicly available, then usually the norm is to strip the data of at least all direct identifiers and of most quasi-identifiers like zip code, ethnicity/ race, hospital visit dates and times, etc.

On matters of health information security, India currently has a draft Digital Information Security in Healthcare Act[18] which provides for the establishment of eHealth Authorities and Health Information Exchanges at central as well as state level. The draft bill provides for only anonymized data to be used for public health-related purposes or granting access even to government departments. As I have mentioned earlier, the current IT Rules are obsolete and there is an urgent need for a stringent legal framework that can deal with the sensitive personal health data of Indian citizens and places the onus on the data processor in case of a breach.

On May 11, 2020, the MeitY notified The Aarogya Setu Data Access and Knowledge Sharing Protocol, 2020[19] (“the protocol”). Although the protocol does provides for the deletion of data collected by the National Information Centre (NIC) after a maximum of 180 days from the day on which it is collected, whether the data stored on the device comes under the purview of “collected by NIC” is unknown.

On matters of sharing aggregated datasets for necessary health interventions, the protocol under Clause 6.b. states:

Response data in the de-identified form may be shared with such Ministries or Departments of the Government of India or the State/Union Territory Governments, local governments, NDMA, SDMAs and such other public health institutions of the Government of India or State Governments or local governments with whom such sharing is necessary to assist in the formulation or implementation of critical health response.

Here, the contextual definition of de-identified data, as provided in the protocol, is “data which has been stripped of personally identifiable data to prevent the individual from being personally identified through such data and assigned a randomly generated ID.” This technique of assigning a random number to data is called pseudonymisation[20]. Apart from this, the protocol uses the term ‘de-identified’ and ‘anonymized’[21] interchangeably. The pseudonymization of a dataset has its drawbacks. For instance, if an algorithm was being used to assign ’random’ IDs to data sets, then even slightly advanced computers can figure out the pattern of assignment, undoing much of the anonymization process. Even under GDPR, personal data that has undergone pseudonymization is explicitly defined to remain personal data under EU data protection laws, as it “should be considered to be information on an identifiable natural person.”[22]

To put things in a simpler form, the protocol allows sharing the response data collected via Aarogya Setu to be shared with ministries/departments of Union/state/local governments and such other public health institutions in a much personally identifiable form.

Unlike what it may suggest, even anonymised data is not hundred per cent anonymous[23] and various data points, usually publicly available in datasets can be corroborated and used to re-identify individuals[24] – a clear and grave threat to privacy rights more so in sensitive matters like health. Given the powerful algorithms and high processing power that is available today, it is impossible to have zero possibility of re-identification even after anonymization. Re-identification of individuals is not impossible and can be achieved when multiple data points are corroborated[25].

However, anonymisation of the collected user data does provide an extra layer of security to the dataset. Care must be taken at all points when dealing with sensitive personal data of patients. Further, it is pertinent that we have stringent data protection laws in place to deal with any kind of data breach and making negligent data processors liable. A personal data protection framework becomes more so necessary when data analytics companies are being given a free hand to surveil the social media chatters of citizens[26].


[1] Harsh Bajpai and Gyan Tripathi, April 13 2020, COVID-19: The 9/11 for Privacy, Contego Humanitas, retrieved from https://contegohumanitas.com/2020/04/13/covid-19-the-9-11-for-privacy/.

[2] First Post, Coronavirus Outbreak: Prasar Bharati makes it mandatory for staff to download and use Aarogya Setu app, April 15 2020, retrieved from https://www.firstpost.com/health/coronavirus-outbreak-prasar-bharati-makes-it-mandatory-for-staff-to-download-and-use-aarogya-setu-app-8263651.html.

[3] Times NOW, Planning to visit a mall in Hyderabad? Downloading Aarogya Setu app is a must, June 8 2020, retrieved from https://www.timesnownews.com/hyderabad/article/planning-to-visit-mall-in-hyderabad-downloading-aarogya-setu-app-is-a-must/603204.

[4] Inc42, Aarogya Setu Mandatory For Metro Travellers In Delhi-NCR, 20 May 2020, retrieved from https://inc42.com/buzz/aarogya-setu-mandatory-for-metro-travellers-in-delhi-ncr/.

[5] Danielle Citron, 24 Dec 2014, BEWARE: The Dangers of Location Data, Forbes, retrieved from https://www.forbes.com/sites/daniellecitron/2014/12/24/beware-the-dangers-of-location-data/#381f961a43cb.

[6] Balu Gopalakrishnan and Ors v. State of Kerala and Ors., W.P.(C). Temp. NO.84 OF 2020

[7] Kim, Minjeong. (2010). The Right to Anonymous Association in Cyberspace: US Legal Protection for Anonymity in Name, in Face, and in Action. 7.

[8] The European Union Working Party with regards to the processing of personal data, 3 December 1997, Anonymity on the Internet – Recommendation 3/97, retrieved from https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/1997/wp6_en.pdf.

[9] Ibid.

[10] Recital 26, Regulation (EU) 2016/679, retrieved from https://gdpr-info.eu/recitals/no-26/.

[11] Indian Medical Council, Code of Medical Ethics Regulation, 2002, retrieved from https://www.mciindia.org/CMS/rules-regulations/code-of-medical-ethics-regulations-2002.

[12] Julie Brill, Comm’r, FTC, Keynote Address at the 23rd Computers, Freedom, and Privacy Conference: Reclaim Your Name 11-12 (June 26, 2013), available at http://www.ftc.gov/speeches/brill/130626computersfreedom.pdf.

[13] Scheitle, C.P. (2011), Google’s Insights for Search: A Note Evaluating the Use of Search Engine Data in Social Research*. Social Science Quarterly, 92: 285-295. doi:10.1111/j.1540-6237.2011.00768.x

[14] Jessica Kent, Health IT Analytics, Data Scientists Use Machine Learning to Discover COVID-19 Treatments, retrieved from https://healthitanalytics.com/news/data-scientists-use-machine-learning-to-discover-covid-19-treatments.

[15] Article 26, Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, OJ 1995 L 281/31, retrieved from https://eur-lex.europa.eu/eli/dir/1995/46/oj.

[16] UNITED STATES. (2004). The Health Insurance Portability and Accountability Act (HIPAA). [Washington, D.C.], U.S. Dept. of Labor, Employee Benefits Security Administration. http://purl.fdlp.gov/GPO/gpo10291.

[17] Khaled El Emam & Luk Arbuckle, Anonymizing Health Data: Case Studies and Methods to Get You Started, 2013.

[18] Ministry of Health & Family Welfare, Digital Information Security in Healthcare, Act (draft), accessible at https://www.nhp.gov.in/NHPfiles/R_4179_1521627488625_0.pdf.

[19] MeitY, The Aarogya Setu Data Access and Knowledge Sharing Protocol, 2020, retrieved from https://meity.gov.in/writereaddata/files/Aarogya_Setu_data_access_knowledge_Protocol.pdf.

[20] GDPR, supra note 1, at Recital 26 and Article 4(5).

[21] The prevalent term is ‘de-identified’ in the American jurisdiction, while GDPR and other European legislation use the term ‘anonymised datasets’.

[22] Sophie Stalla-Bourdillon & Alison Knight, Anonymous Data v. Personal Data – False Debate: An EU Perspective on Anonymization, Pseudonymization and Personal Data, 34 Wis. Int’l L.J. 284 (2016).

[23] Alex Hern, 23 July 2019, ‘Anonymised’ data can never be totally anonymous, says study, The Guardian, retrieved from https://www.theguardian.com/technology/2019/jul/23/anonymised-data-never-be-anonymous-enough-study-finds.

[24] John Bohannon, Genealogy Databases enable Naming of Anonymous DNA Donors, 18 Jan 2013, retrieved from https://science.sciencemag.org/content/339/6117/262.abstract.

[25] Arvind Narayanan and Vitaly Shmatikov, May 21 2019, Robust de-anonymization of large sparse datasets: a decade later, retrieved from https://www.cs.princeton.edu/~arvindn/publications/de-anonymization-retrospective.pdf.

[26] The New Indian Express, 6 June 2020, ‘Sprinklr’ helping Telangana track netizens’ COVID-19 talk across social media platforms, retrieved from https://www.newindianexpress.com/states/telangana/2020/jun/06/sprinklr-helping-telangana-track-netizens-covid-19-talk-across-social-media-platforms-2152844.html.

Previous page 1 2
Tags

Gyan Tripathi

Gyan is Editor, Information Technology for Metacept and has a keen interest in tech and the evolution of cyber policy and tech laws. Tweets @Gyan_Tripathi_

Related Articles

Leave a Reply

Close