Research Area:  Data Mining
The research presented in this thesis is directed at investigating and evaluating the usage of cryptography to provide secure data analysis using a third party. The motivation is the emergence Data Mining as a Service (DMaaS), which in turn has been motivated by cloud computing technology that provides the potential for reducing the operational cost of analyzing data by utilizing the storage and computing services provided by cloud service providers. DMaaS has also opened the door for collaborative data mining whereby multiple data owners pool their data for analysis, using a cloud provider offering DMaaS, to gain some mutual benefit.
The challenge is for the data analysis to be conducted in a secure manner. Data privacy can be substantially preserved using cryptography. With the emergence of Homomorphic Encryption (HE)schemes encrypted data can, to an extent, be securely processed without decryption.However, current HE schemes have imposed constraints on the computation, both in terms of the arithmetic operations provided (not all operations required by data mining algorithms are supported) and computational overhead (multiplication can become veryslow). Solutions that have been introduced in the literature include: (i) resorting to Secure Multi-Party Computation (SMPC) protocols or (ii) substantial data owner involvement whenever unsupported operations are required.
The utility of the data proxy idea is illustrated using collection of proposed secure data clustering and classification algorithms that operate over encrypted data. The thesis also introduces several encryption schemes designed to address the limitations of existing schemes in the context of DMaaS. Throughout the thesis two distinctive DMaaS scenarios are considered, the single data owner scenario and the multiple data owner scenario.
The proposed concepts, schemes and secure data mining algorithms we reevaluated using two categories of data; UCI datasets and randomly generated synthetic datasets. The synthetic datasets were used to evaluate the scalability of proposed solutions by analyzing the runtime as the data size increases. The evaluation was conducted to compare the operation of the proposed approaches with each other, and the relevant standard (insecure) algorithms.
Name of the Researcher:  Nawal Mohammed Almutairi
Name of the Supervisor(s):  Coenen
Year of Completion:  2020
University:  University of Liverpool
Thesis Link:   Home Page Url