Despite great advancements in machine learning, especially deep learning, current learning systems have severe limitations. Even if a learner performs well in the typical scenario in which it is trained and tested on the same/similar data distribution, it can fail under new scenarios and be fooled and misled by attacks at inference time (adversarial examples) or training time (data poisoning attacks). As learning systems become pervasive, safeguarding their security and privacy is critical.
In particular, recent studies have shown that the current learning systems are vulnerable to evasion attacks. For example, our work has shown that by putting printed color stickers on road signs, a learner can be easily fooled by such physical perturbation. A model may also be trained with a poisoned data set, causing it to make incorrect predictions under certain scenarios. Our recent work has demonstrated that attackers can embed “backdoors” in a learner using a poisoned data set for real-world applications such as face recognition systems. More exploration of ML model vulnerabilities can be found in thread model exploration.
Several solutions to these threats have been proposed, but they are not resilient against intelligent adversaries responding dynamically to the deployed defenses. Thus, the question of how to improve the robustness of machine learning models against advanced adversaries remains largely unanswered. Our research aims to provide robustness guarantees for different ML models through game-theoretic analysis, providing certification methods for different ML paradigms, and designing learning with reasoning ML pipelines to improve the robustness certification.
Machine learning has been widely adopted in various applications, and such successes largely depend on a combination of algorithmic breakthroughs, computation resource improvements, and especially access to a large amount of diverse training data. However, such massive data usually contain privacy-sensitive information such as medical and financial information of individuals. With the rise of ubiquitous sensing, personalization, and virtual assistants, users' privacy is at ever-increasing risk. Can we enable the power and utility of machine learning and data analytics while still ensuring users' privacy? Can we design privacy-preserving learning algorithms that can ensure privacy and guarantee high data utility? Can we design privacy-preserving data generative models for general downstream tasks?
Here we aim to explore novel techniques including differential privacy, homomorphic encryption, and information-theoretic analysis to enable privacy-preserving machine learning and data analytics in practice. Our long-term goal is to provide practical solutions to privacy-preserving machine learning and data synthesis, as well as deepen the theoretical understanding of data privacy.
As robustness and privacy are mainly concerned with data distribution shift and inference under an adversarial setting, ML generalization---a never-ending pursuit of the ML community for decades---tackles these aspects under natural distribution shifts. Thus, natural questions arise: What is the relationship between the privacy, robustness, and generalization of ML? Can we leverage the advances of one to help the other? Is there a tradeoff between robustness, privacy, and domain generalization?
Towards improving ML generalization, we focus on two perspectives: (1) uncovering the underlying connections between ML robustness, privacy, and generalization; (2) enabling one based on the advances of the other. For instance, our work has proved that adversarial (robustness) and domain (generalization) transferability are bidirectional indicators for each other, which has great implications for a range of applications, such as model selection. This line of research provides the potential to further tighten different trustworthy functionalities of ML systems.
We would like to thank the following sponsors and funding agencies for supporting our research: NSF, Alfred P. Sloan Foundation, DARPA, NASA, ONR, Amazon, Facebook, Google, Intel, IBM, JPMorgan Chase, Microsoft Research, NVIDIA, Sony, C3AI, eBay.