Privacy Preserving Machine Learning


Accuracy of machine learning models depend on the dataset they are trained on. Large high-quality datasets lead to better accuracy. However, in many use cases (IoT, health, mobile users), datasets we want to train on are private to end-users. Western governments (e.g., European GDPR) have enacted laws that prevent exchange of sensitive data (e.g., genomic data) across countries. Federated learning partly addresses this problem by training a model across multiple decentralized devices holding the private data. However, this is a highly compute resource intensive process. Its decentralized training solutions often do not reach the accuracy of centralized training. Finally, it does not provide strong privacy guarantees. In many instances, a centralized orchestrator may be able to see the ML model parameters learned in edge nodes, through which sensitive data can be inferred.

The proposed project seeks to investigate the use of secure enclaves (e.g., Intel SGX) to enable efficient and accurate privacy-preserving federated learning. Secure enclaves are a relatively new hardware feature that enables us to guarantee that user data is invisible to all parts of the computing system, including the operated systems. This feature is now available on public cloud instances. Using secure enclaves, we can train on encrypted private data on a centralized public cloud, without revealing its contents to the cloud service provider. Only the final trained model will be public.

This project will help forge collaboration between PIs with different expertise. Chowdhury and Madhyastha have developed FedScale, an infrastructure that enables evaluation of federated learning solutions. Narayanasamy has developed methods to improve the security of secure enclaves, and used it to enable privacy-preserving genome wide association studies (GWAS).

People

Satish
Narayanasamy

CSE
Engineering

Mosharaf
Chowdhury

CSE
Engineering

Harsha
Madhyastha

CSE
Engineering


Funding

Funding: $45K (2022)
Goal: Intel SGX enabled privacy-preserving ML training solution using private end-user data on a centralized public cloud.
Token Investors: Satish Narayanasamy, Mosharaf Chowdhury, Harsha Madhyastha


Project ID: 1054