Research Area:  Machine Learning
Privacy regulations press organisations to handle personal data with reinforced caution. Moreover, organisations are dealing with increasing amounts of Personally Identifiable Information in their systems. Thus, there is a high demand not only for privacy-preserving data processing mechanisms but also privacy-enhancing services. As such, we propose the Personal Data Analyser, a tool that increases privacy assurances and minimises privacy risks through automated privacy-preserving data monitoring and privacy risk assessment mechanisms. Automated data monitoring is achieved with a hybrid mechanism that employs Regular Expressions, Natural Language Processing tools, and machine learning models such as Multilayer Perceptron and Random Forests. Our privacy risk assessment mechanism is based on custom-built crisp and fuzzy models, that consider information such as data processor reputation, data sensitiveness and other inputs in order to assess privacy risk associated with data transactions. Our work is integrated and validated under real use cases of the PoSeID-on platform and warns users whenever potential privacy risks are detected. Validation under PoSeID-on-s pilots and its users proved beneficial not only to assess our solution but also to raise users awareness of their data. The results of this work show that our solution is an effective Privacy Enhancing Technology that increases privacy assurances between organisations and their users.
Privacy preservation
Machine learning
Natural language processing
Personally identifiable information
Author(s) Name:  Paulo Silva, Carolina Gonçalves, Nuno Antunes, Marilia Curado, Bogdan Walek
Journal name:  Expert Systems with Applications
Conferrence name:  
Publisher name:  Elsevier
DOI:  10.1016/j.eswa.2022.116867
Volume Information:  Volume 200, 15 August 2022, 116867
Paper Link: