Privacy-preserving Machine Learning and Data Analytics

[Research Statement] [Publications] [Home]

Research Statement

Machine learning has recently been widely adopted in various applications, and such success is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and especially the access to a large amount of diverse training data. However, such massive data usually contain privacy sensitive information such as medical and financial information of individuals. With the rise of ubiquitous sensing, personalization, and virtual assistants, users' privacy is at ever-increasing risk. Can we enable the power and utility of machine learning and data analytics while still ensuring users' privacy? Can we design privacy-preserving learning algorithms that can ensure privacy and guarantee high data utility? Can we design privacy-preserving data generative models for general downstream tasks?

Here we aim to explore novel techniques including differential privacy, homomorphic encryption, and information theoretic analysis to enable privacy-preserving machine learning and data analytics in practice. Our long-term goal is to both provide practical real-world solutions to privacy-preserving machine learning and data analytics and deepen the theoretical understanding of data privacy in the big data era.

Recent Publications

LinkTeller: Recovering Private Edges from Graph Neural Networks via Influence Analysis

Fan Wu, Yunhui Long, Ce Zhang, Bo Li.

IEEE Symposium on Security and Privacy (Oakland), 2022


G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators

Yunhui Long*, Boxin Wang*, Zhuolin Yang, Bhavya Kailkhura, Aston Zhang, Carl A. Gunter, Bo Li.

NeurIPS 2021


DataLens: Scalable Privacy Preserving Training via Gradient Compression and Aggregation

Boxin Wang*, Fan Wu*, Yunhui Long*, Luka Rimanic, Ce Zhang, Bo Li.

CCS 2021


Application-Driven Privacy-Preserving Data Publishing with Correlated Attributes

Aria Rezaei, Chaowei Xiao, Jie Gao, Bo Li, Sirajum Munir.

Embedded Wireless Systems and Networks (EWSN 2021) (Best Paper Award)


How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning

​Xinlei Pan, Weiyao Wang, Xiaoshuai Zhang, Bo Li, Jinfeng Yi, Dawn Song.

International Conference on Autonomous Agents and Multiagent Systems (AAMAS). May, 2019


Towards Efficient Data Valuation Based on the Shapley Value

​Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes, Bo Li, Ce Zhang, Dawn Song, Costas Spanos.



Get Your Workload in Order: Game Theoretic Prioritization of Database Auditing

​Chao Yan, Bo Li, Yevgeniy Vorobeychik, Aron Laszka, Daniel Fabbri, Bradley Malin.

ICDE 2018


Engineering Agreement: The Naming Game with Asymmetric and Heterogeneous Agents

​J. Gao, B. Li, G. Schoenebeck and F. Yu.

In Proceedings of the 30th International Conference on Artificial Intelligence (AAAI 2017).


Iterative classification for sanitizing large-scale datasets

​B. Li, Y. Vorobeychik, M. Li, and B. Malin.

ICDM 2015