A Leaderboard for Provable Training and Verification Approaches Towards Robust Neural Networks

Recently, provable (i.e. certified) adversarial robustness training and robustness verification approaches have demonstrated their importance in the adversarial learning community.

In contrast to empirical robustness and empirical adversarial attacks, (common) provable robustness training/verification approaches provide rigorous lower bound for the network robustness, such that no existing or future attacks will be able to attack the model further. The verification approaches, which verifies the robustness bound given a neural network model, are strongly connected with the training methods. Thus, after training on the training set, the provable robustness bound of a model can be measured in terms of the ratio of the verifiable robust points in the test set. To better record the advances in this field, we release a repo on Github (click to open) which keeps track of state-of-the-art provable robustness models on common datasets such as ImageNet, CIFAR10, and MNIST. Besides, a categorized paper list is attached.

Welcome to visit our repo and provide any feedback or updates!

Adversarial Objects Against LiDAR-Based Autonomous Driving Systems

Deep neural networks (DNNs) are found to be vulnerable against adversarial examples, which are carefully crafted inputs with a small magnitude of perturbation aiming to induce arbitrarily incorrect predictions. Recent studies show that adversarial examples can pose a threat to real-world security-critical applications: a "physical adversarial Stop Sign" can be synthesized such that the autonomous driving cars will mis-recognize it as others (e.g., a speed limit sign). However, these image-space adversarial examples cannot easily alter 3D scans of widely equipped LiDAR or radar on autonomous vehicles. In this paper, we reveal the potential vulnerabilities of LiDAR-based autonomous driving detection systems, by proposing an optimization based approach LiDAR-Adv to generate adversarial objects that can evade the LiDAR-based detection system under various conditions. We first show the vulnerabilities using a blackbox evolution-based algorithm, and then explore how much a strong adversary can do, using our gradient-based approach LiDAR-Adv. We test the generated adversarial objects on the Baidu Apollo autonomous driving platform and show that such physical systems are indeed vulnerable to the proposed attacks. We also 3D-print our adversarial objects and perform physical experiments to illustrate that such vulnerability exists in the real world.

SemanticAdv: Generating Adversarial Examples via Attribute-conditional Image Editing

Deep neural networks (DNNs) have achieved great success in various applications due to their strong expressive power. However, recent studies have shown that DNNs are vulnerable to adversarial examples which are manipulated instances aiming to mislead DNNs to make incorrect predictions. Currently, most such adversarial examples try to guarantee the subtle perturbation by limiting its Lp norm. In this paper, we aim to explore the impact of semantic manipulation on DNNs by manipulating semantic attributes of images and generate unrestricted adversarial examples. Such semantic based perturbation is more practical compared with pixel level manipulation. In particular, we propose SemanticAdv which leverages disentangled semantic factors to generate adversarial perturbation via altering a single semantic attribute. We conduct extensive experiments to show that the semantic based adversarial examples can not only attack different learning tasks such as face verification and landmark detection, but also achieve high attack success rate against real-world black-box services such as Azure face verification service. Such structured adversarial examples with controlled semantic manipulation can shed light on further understanding about vulnerabilities of DNNs as well as potential defensive approaches.

Characterzing Audio Adversarial Examples using Temporal Dependency

Recent studies have highlighted adversarial examples as a ubiquitous threat to different neural network models and many downstream applications. Nonetheless, as unique data properties have inspired distinct and powerful learning principles, this paper aims to explore their potentials towards mitigating adversarial inputs. In particular, our results reveal the importance of using the temporal dependency in audio data to gain discriminate power against adversarial examples. Tested on the automatic speech recognition (ASR) tasks and three recent audio adversarial attacks, we find that (i) input transformation developed from image adversarial defense provides limited robustness improvement and is subtle to advanced attacks; (ii) temporal dependency can be exploited to gain discriminative power against audio adversarial examples and is resistant to adaptive attacks considered in our experiments. Our results not only show promising means of improving the robustness of ASR systems but also offer novel insights in exploiting domain-specific data properties to mitigate negative effects of adversarial examples.