Amazing technological breakthrough possible @S-Logix pro@slogix.in

Office Address

  • #5, First Floor, 4th Street Dr. Subbarayan Nagar Kodambakkam, Chennai-600 024 Landmark : Samiyar Madam
  • pro@slogix.in
  • +91- 81240 01111

Social List

Data Mining and Privacy: Modeling Sensitive Data with Differential Privacy

Data Mining and Privacy: Modeling Sensitive Data with Differential Privacy

PhD Thesis on Data Mining and Privacy: Modeling Sensitive Data with Differential Privacy

Research Area:  Data Mining

Abstract:

   In the data-driven society of the 21st century, mining data to discover information about people is becoming increasingly valuable. The information can be used to learn more about society and humanity, or to build models that enable us to predict future events. Applications of data mining range from commercial endeavors, to contributing to the common good through demographic and medical studies. Unfortunately, sometimes there are real-world considerations that conflict with the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to protect sensitive information, while simultaneously not ruining the informative or predictive power of the outputted model.
   Many techniques have been developed to preserve privacy over the years, but one stands out above the rest: differential privacy. Differential privacy is an enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their personal information.
   In this thesis, we focus on one particular data mining algorithm - decision trees - and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze the conflicts that arise when balancing privacy requirements with the utility of a model. We view "utility" as a two-sided coin; on one side there is prediction accuracy, and on the other there is knowledge discovery. Optimal results for both sides cannot be achieved at the same time, and the importance of each side is dependent on the user-s needs. We explore the trade-offs that need to be made when prioritizing one side over the other.

Name of the Researcher:  Samuel Fletcher

Name of the Supervisor(s):  Islam, Zahid

Year of Completion:  2017

University:  Charles Sturt University

Thesis Link:   Home Page Url