Towards the Robustness of Deep Learning Systems Against Adversarial Examples in Sequential Data Público
Wenjie Wang (Fall 2022)
Abstract
Deep learning has achieved state-of-the-art performance in various real-world applications,
including computer vision (CV), natural language processing (NLP), speech recognition
and clinical informatics. Although deep learning systems are powerful, they are overly
sensitive to perturbation in the input which would not fool a human observer. Recent studies
have shown that adversarial examples can be generated by applying small perturbations
to the inputs such that the well-trained deep neural networks (DNNs) will misclassify. With
the increasing number of safety and security-sensitive applications of deep learning models,
the robustness of deep learning models to adversarial inputs has become a crucial topic.
Research on the adversarial examples in computer vision (CV) domains has been well
studied. However, the intrinsic difference between image and sequential data has placed
great challenges for directly applying adversarial techniques in CV to other application
domains such as speech, health informatics, and natural language processing (NLP).
To solve these gaps and challenges, my dissertation research combines multiple studies
to improve the robustness of deep learning systems against adversarial examples in sequential
inputs. First, we take the NLP and health informatics domains as examples, focusing on
understanding the characteristics of these two domains individually and designing empirical
adversarial defense methods, which are 1) RADAR, an adversarial detection for EHR
data, and 2) MATCH, detecting adversarial examples leveraging the consistency between
multiple modalities. Following the empirical defense methods, our next step is exploring
certified robustness for sequential inputs which is provable and theory-backed. To this
end, 1) We propose WordDP, certified robustness to word substitution attacks in the NLP
domain, leveraging the connection of differential privacy and certified robustness. 2) We
studied the certified robustness methods to univariant time-series data and proposed an
adversarial attack in the Wasserstein space which is more appropriate for measuring the
in-distinguishability for time series data.
Table of Contents
1 Introduction 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 An Adversarial Detection Method for Sequential EHR Data (Sec. 4) 4
1.2.2 Detecting Adversarial Examples Leveraging the Consistency between
Multiple Modalities (Sec. 5) . . . . . . . . . . . . . . . . . 6
1.2.3 Certified Robustness to word substitution attacks via Differential
Privacy (Sec. 6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4 Certified Robustness to the uni-variant time series data in theWasserstein
space (Sec. 7) . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Related Works 10
2.1 Adversarial Attack Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 White-Box Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.2 Black-Box Attack [75] . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Real-World Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.4 Non-LP-norm Attack . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Defense Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Network Distillation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.3 Adversarial Detection . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.4 Gradient masking . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.5 State-of-the-art Defense . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Certified Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Adversarial Examples in NLP Domain . . . . . . . . . . . . . . . . . . . . 17
2.4.1 Attack Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.2 Defense Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.3 Certified Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Adversarial Examples in Clinical Research . . . . . . . . . . . . . . . . . 25
2.6 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Preliminaries 28
3.1 Adversarial Word Substitution and Certified Robustness . . . . . . . . . . . 28
3.2 Differential Privacy and Exponential Mechanism . . . . . . . . . . . . . . 29
4 RADAR: Recurrent Autoencoder Based Detector for Adversarial Examples on
Temporal EHR 31
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Recurrent Autoencoder Architecture . . . . . . . . . . . . . . . . . 34
4.2.2 RADAR Detection Criteria . . . . . . . . . . . . . . . . . . . . . . 36
4.2.3 Enhanced Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2 Attack Performance . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.3 Detection Performance . . . . . . . . . . . . . . . . . . . . . . . . 42
5 Detecting Adversarial Examples Leveraging the Consistency between Multiple
Modalities 46
5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.1 Multi-modality Model Consistency Check . . . . . . . . . . . . . . 48
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.2 Predictive Model Performance . . . . . . . . . . . . . . . . . . . . 51
5.3.3 Attack Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3.4 Defense Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6 Certified Robustness to Word Substitution Attack with Differential Privacy 57
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.1 WordDP for Certified Robustness . . . . . . . . . . . . . . . . . . 59
6.2.2 WordDP with Exponential Mechanism . . . . . . . . . . . . . . . 61
6.2.3 Simulated Exponential Mechanism . . . . . . . . . . . . . . . . . 63
6.2.4 Extension of WordDP: Empirical defense method . . . . . . . . . . 67
6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3.1 Evaluation Metrics and Baselines . . . . . . . . . . . . . . . . . . 69
6.3.2 Certified Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Wasserstein Adversarial Examples on Univariant Time Series Data and its
Certified Robustness 76
7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2.1 Wasserstein Projection . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2.2 Wasserstein PGD Attack . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.3.2 Attack Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3.3 Effectiveness of 2-step Projection . . . . . . . . . . . . . . . . . . 85
7.3.4 Comparison with L∞ PGD . . . . . . . . . . . . . . . . . . . . . . 86
7.3.5 Countermeasure against Wasserstein PGD . . . . . . . . . . . . . . 89
8 Conclusion and future work 93
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palabra Clave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Towards the Robustness of Deep Learning Systems Against Adversarial Examples in Sequential Data () | 2022-11-24 22:44:28 -0500 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|