Defensive Machine Learning Techniques for Countering Adversarial Attacks Open Access

Razmi Marani, Fereshteh (Spring 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/1z40kv242?locale=en
Published

Abstract

The increasing reliance on machine learning algorithms has made them a target for exploiting vulnerabilities in these systems and launching adversarial attacks. The attacker in these attacks manipulates either the training data or test data, or both, known as a poisoning attack, adversarial example, or backdoor attack, respectively. They primarily aim to disrupt the model’s classification task. In cases where the model is interpretable, the attacker may target the interpretation of the model’s output.

These attacks can have significant negative impacts; therefore, it is crucial to develop effective defense methods to protect against them. Current defense methods have limitations. Outlier detectors, used to identify and mitigate poisoning attacks, require prior knowledge of the attack and clean data to train the detector. Robust defense methods show promising results in mitigating backdoor attacks, but their effectiveness comes at the cost of decreased model utility. Furthermore, few defense methods have addressed adversarial examples that target the interpretation of the model’s output.

To address these limitations, we propose defense methods that protect machine learning models from adversarial attacks. Our methods include an autoencoder-based detection approach to identify various untargeted poisoning attacks. We also provide a comprehensive comparative study of differential privacy approaches and suggest new approaches based on label differential privacy to defend against backdoor attacks. Lastly, we propose a novel attack and defense method to protect the interpretation of a healthcare-related machine learning model. These approaches represent significant progress in the field of machine learning security and have the potential to protect against a wide range of adversarial attacks. 

Table of Contents

1 Introduction........................... ........................... ........................... ........................... ........................... ..................... 1

1.1 Motivation.................................... ........................... ........................... ........................... .................................. 1

1.1.1 AdversarialAttacks..................................................................................................................................... 1

1.1.2 DefenseApproaches.................................................................................................................................... 2

1.1.3 GapsandChallenges .................................................................................................................................... 4

1.2 ResearchContributions........................................................................................................................................ 5

1.2.1 Classification Auto-Encoder based Detector against Diverse Data PoisoningAttacks(Chapter2)......................... 6

1.2.2 Prevention of Backdoor Attacks through Differential Privacy in Practice(Chapter3) ......................................... 7

1.2.3 Interpretation Attacks on Interpretable Models with Electronic HealthRecords(Chapter4) ................................ 9

2 Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks ........................................................11

2.1 ProblemDefinition .............................................................................................................................................. 11

2.2 Preliminaries and Backgrounds ............................................................................................................................ 12

2.2.1 Untargeted Poisoning Attacks ...................................................................................................................... 12

2.2.2 Auto-encoders in Anomaly Detection . . . . ........................................................................................ . . . . . . . . . 13

2.3 ProposedApproach............................................................................................................................................... 15

2.3.1 ClassificationAuto-Encoder(CAE) . . . . . . . . . .................................................................................................... 16

2.3.2 Enhanced Classification Auto-encoder (CAE+) . ............................................................................................. 17

2.4 EvaluationResults ................................................................................................................................................ 19

2.4.1 ExperimentalSetup....................................................................................................................................... 19

2.4.2 Results........................................................................................................................................................ 23

3 Prevention of Backdoor Attacks through Differential Privacy in Practice ............................................................................32

3.1 ProblemDefinition ................................................................................................................................................ 32

3.2 Preliminaries and Backgrounds ............................................................................................................................. 33

3.2.1 BackdoorAttacks.......................................................................................................................................... 33

3.2.2 Differential Privacy and Label Differential Privacy . . . .................................................................................. . . . 34

3.2.3 DP and Label-DP in Deep Learning................................................................................................................. 35

3.2.4 Robustness of DP........................................................................................................................................... 38

3.3 ProposedApproach................................................................................................................................................. 40

3.4 Experiments and Results......................................................................................................................................... 42

3.4.1 ExperimentalSetup......................................................................................................................................... 42

3.4.2 ExperimentalRoadmap ................................................................................................................................... 45

3.4.3 DP against Backdoors ..................................................................................................................................... 45

3.4.4 Label DP against Backdoors............................................................................................................................. 50

3.4.5 Comparison of DP and Label-DP Methods . . . . . . . ...............................................................................................52

4 Interpretation Attacks on Interpretable Models with Electronic Health Records.................................................................... 59

4.1 ProblemDefinition ................................................................................................................................................... 59

4.2 Preliminaries and Related Work................................................................................................................................. 60

4.2.1 Attacks on Images via Model’s Gradients . . . ............................................................................................... . . . . . 60

4.2.2 Attack on EHRs via Medical Attention-based Models . . ..................................................................................... . . 61

4.3 ProposedApproach.................................................................................................................................................... 62

4.3.1 ProblemSetting................................................................................................................................................. 63

4.3.2 InterpretationAttackFormulation. . . . . . . ........................................................................................................... . 64

4.3.3 Optimization with Dynamic Penalty . . . . . ............................................................................................... . . . . . . . ...65

4.3.4 MinimizingDetectability..................................................................................................................................... 66

4.3.5 Metrics for Evaluation ....................................................................................................................................... 68

4.3.6 Robustness ........................................................................................................................................................ 71

4.4 Experiments............................................................................................................................................................... 73

4.4.1 ExperimentalSetup..............................................................................................................................................73

4.4.2 AttackPerformance............................................................................................................................................. 74

4.4.3 AttackDetectability ............................................................................................................................................. 76

4.4.4 Robustness ........................................................................................................................................................ 78

5 Conclusion.......................................................................................................................................................................... 80

5.1 Summary .................................................................................................................................................................... 80

5.2 FutureWork................................................................................................................................................................. 82

Bibliography 84 

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files