Anonymization technique protects the privacy of open data
The technique is researched by the authors of Ho Chi Minh City Polytechnic University, in order to protect privacy, while ensuring maximum data quality for analysis, availability and continuity of the open data system.
.jpg)
System interface and functions. Photo: NNC
Currently, open data is an inevitable development trend of the world; in which, the generated data (internal data of agencies and organizations, from individuals, devices) is gradually becoming public and available for everyone to use without limitation of problems related to copyrights and patents.
Open data is also considered as one of the indicators to assess the development level of e-government. Many countries have set up dedicated portals to share data, making it easy for citizens to access these open data.
Vietnam is considered one of the countries with a high e-government index. Open data sets in Vietnam are divided into categories including open data on education, science and technology, natural resources - environment and etc.
However, the construction and provision of open data in Vietnam face many problems; in which, it is prominent and urgent to have solutions for security and privacy issues. Sensitive, private data must be removed or concealed before it is made public.
Anonymization is an indispensable step before making data public. This is the key technology that supports privacy protection at different levels, meeting many different application requirements as well as different policies and laws.
There have been many anonymization techniques researched and developed such as compression, data reduction, attribute change, data scrambling, etc. However, it is necessary to choose an appropriate anonymization technique for each type of service.
In Vietnam, most datasets are publicly available in PDF format, after removing the identifying information of the data owner, or publicly as statistical data, but still no anonymous method is applied flexibility to protect the privacy needed for these data holders. Therefore, it is very urgent to propose an anonymization technique to protect the privacy of open data in Vietnam.
In the study "Anonymization techniques to protect the privacy of open data", the team of authors at Ho Chi Minh City Polytechnic University built a suitable foundational architecture with data processing techniques, to protect privacy, before making public data, while ensuring maximum data quality for analysis, system availability and continuity. This technique is compatible with the characteristics of various data, especially in the smart city environment.
The system has been tested with data sets, such as SS13ACS (results of the US Census Department population survey); IHIS (results of the US population health survey); and etc. These data sets have been successfully anonymized and the results appear in a directory managed by CKAN.
ctngoc