Towards the Robustness of Deep Learning Systems Against Adversarial Examples in Sequential Data Público

Wenjie Wang (Fall 2022)

Permanent URL: https://etd.library.emory.edu/concern/etds/z890rv550?locale=es
Published

Abstract

Deep learning has achieved state-of-the-art performance in various real-world applications,

including computer vision (CV), natural language processing (NLP), speech recognition

and clinical informatics. Although deep learning systems are powerful, they are overly

sensitive to perturbation in the input which would not fool a human observer. Recent studies

have shown that adversarial examples can be generated by applying small perturbations

to the inputs such that the well-trained deep neural networks (DNNs) will misclassify. With

the increasing number of safety and security-sensitive applications of deep learning models,

the robustness of deep learning models to adversarial inputs has become a crucial topic.

Research on the adversarial examples in computer vision (CV) domains has been well

studied. However, the intrinsic difference between image and sequential data has placed

great challenges for directly applying adversarial techniques in CV to other application

domains such as speech, health informatics, and natural language processing (NLP).

To solve these gaps and challenges, my dissertation research combines multiple studies

to improve the robustness of deep learning systems against adversarial examples in sequential

inputs. First, we take the NLP and health informatics domains as examples, focusing on

understanding the characteristics of these two domains individually and designing empirical

adversarial defense methods, which are 1) RADAR, an adversarial detection for EHR

data, and 2) MATCH, detecting adversarial examples leveraging the consistency between

multiple modalities. Following the empirical defense methods, our next step is exploring

certified robustness for sequential inputs which is provable and theory-backed. To this

end, 1) We propose WordDP, certified robustness to word substitution attacks in the NLP

domain, leveraging the connection of differential privacy and certified robustness. 2) We

studied the certified robustness methods to univariant time-series data and proposed an

adversarial attack in the Wasserstein space which is more appropriate for measuring the

in-distinguishability for time series data.

Table of Contents

1 Introduction 1

1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 An Adversarial Detection Method for Sequential EHR Data (Sec. 4) 4

1.2.2 Detecting Adversarial Examples Leveraging the Consistency between

Multiple Modalities (Sec. 5) . . . . . . . . . . . . . . . . . 6

1.2.3 Certified Robustness to word substitution attacks via Differential

Privacy (Sec. 6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.4 Certified Robustness to the uni-variant time series data in theWasserstein

space (Sec. 7) . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Related Works 10

2.1 Adversarial Attack Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 White-Box Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.2 Black-Box Attack [75] . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.3 Real-World Attack . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.4 Non-LP-norm Attack . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Defense Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Network Distillation . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2 Adversarial Training . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.3 Adversarial Detection . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.4 Gradient masking . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.5 State-of-the-art Defense . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Certified Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Adversarial Examples in NLP Domain . . . . . . . . . . . . . . . . . . . . 17

2.4.1 Attack Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.4.2 Defense Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.3 Certified Robustness . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Adversarial Examples in Clinical Research . . . . . . . . . . . . . . . . . 25

2.6 Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Preliminaries 28

3.1 Adversarial Word Substitution and Certified Robustness . . . . . . . . . . . 28

3.2 Differential Privacy and Exponential Mechanism . . . . . . . . . . . . . . 29

4 RADAR: Recurrent Autoencoder Based Detector for Adversarial Examples on

Temporal EHR 31

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Recurrent Autoencoder Architecture . . . . . . . . . . . . . . . . . 34

4.2.2 RADAR Detection Criteria . . . . . . . . . . . . . . . . . . . . . . 36

4.2.3 Enhanced Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.1 Data and Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.2 Attack Performance . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.3 Detection Performance . . . . . . . . . . . . . . . . . . . . . . . . 42

5 Detecting Adversarial Examples Leveraging the Consistency between Multiple

Modalities 46

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2.1 Multi-modality Model Consistency Check . . . . . . . . . . . . . . 48

5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.2 Predictive Model Performance . . . . . . . . . . . . . . . . . . . . 51

5.3.3 Attack Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3.4 Defense Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 Certified Robustness to Word Substitution Attack with Differential Privacy 57

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.2.1 WordDP for Certified Robustness . . . . . . . . . . . . . . . . . . 59

6.2.2 WordDP with Exponential Mechanism . . . . . . . . . . . . . . . 61

6.2.3 Simulated Exponential Mechanism . . . . . . . . . . . . . . . . . 63

6.2.4 Extension of WordDP: Empirical defense method . . . . . . . . . . 67

6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.3.1 Evaluation Metrics and Baselines . . . . . . . . . . . . . . . . . . 69

6.3.2 Certified Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7 Wasserstein Adversarial Examples on Univariant Time Series Data and its

Certified Robustness 76

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.2.1 Wasserstein Projection . . . . . . . . . . . . . . . . . . . . . . . . 78

7.2.2 Wasserstein PGD Attack . . . . . . . . . . . . . . . . . . . . . . . 80

7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.3.2 Attack Success Rate . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.3.3 Effectiveness of 2-step Projection . . . . . . . . . . . . . . . . . . 85

7.3.4 Comparison with L∞ PGD . . . . . . . . . . . . . . . . . . . . . . 86

7.3.5 Countermeasure against Wasserstein PGD . . . . . . . . . . . . . . 89

8 Conclusion and future work 93

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Palabra Clave
Committee Chair / Thesis Advisor
Committee Members
Última modificación

Primary PDF

Supplemental Files