Defensive Machine Learning Techniques for Countering Adversarial Attacks 公开
Razmi Marani, Fereshteh (Spring 2023)
Abstract
The increasing reliance on machine learning algorithms has made them a target for exploiting vulnerabilities in these systems and launching adversarial attacks. The attacker in these attacks manipulates either the training data or test data, or both, known as a poisoning attack, adversarial example, or backdoor attack, respectively. They primarily aim to disrupt the model’s classification task. In cases where the model is interpretable, the attacker may target the interpretation of the model’s output.
These attacks can have significant negative impacts; therefore, it is crucial to develop effective defense methods to protect against them. Current defense methods have limitations. Outlier detectors, used to identify and mitigate poisoning attacks, require prior knowledge of the attack and clean data to train the detector. Robust defense methods show promising results in mitigating backdoor attacks, but their effectiveness comes at the cost of decreased model utility. Furthermore, few defense methods have addressed adversarial examples that target the interpretation of the model’s output.
To address these limitations, we propose defense methods that protect machine learning models from adversarial attacks. Our methods include an autoencoder-based detection approach to identify various untargeted poisoning attacks. We also provide a comprehensive comparative study of differential privacy approaches and suggest new approaches based on label differential privacy to defend against backdoor attacks. Lastly, we propose a novel attack and defense method to protect the interpretation of a healthcare-related machine learning model. These approaches represent significant progress in the field of machine learning security and have the potential to protect against a wide range of adversarial attacks.
Table of Contents
1 Introduction........................... ........................... ........................... ........................... ........................... ..................... 1
1.1 Motivation.................................... ........................... ........................... ........................... .................................. 1
1.1.1 AdversarialAttacks..................................................................................................................................... 1
1.1.2 DefenseApproaches.................................................................................................................................... 2
1.1.3 GapsandChallenges .................................................................................................................................... 4
1.2 ResearchContributions........................................................................................................................................ 5
1.2.1 Classification Auto-Encoder based Detector against Diverse Data PoisoningAttacks(Chapter2)......................... 6
1.2.2 Prevention of Backdoor Attacks through Differential Privacy in Practice(Chapter3) ......................................... 7
1.2.3 Interpretation Attacks on Interpretable Models with Electronic HealthRecords(Chapter4) ................................ 9
2 Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks ........................................................11
2.1 ProblemDefinition .............................................................................................................................................. 11
2.2 Preliminaries and Backgrounds ............................................................................................................................ 12
2.2.1 Untargeted Poisoning Attacks ...................................................................................................................... 12
2.2.2 Auto-encoders in Anomaly Detection . . . . ........................................................................................ . . . . . . . . . 13
2.3 ProposedApproach............................................................................................................................................... 15
2.3.1 ClassificationAuto-Encoder(CAE) . . . . . . . . . .................................................................................................... 16
2.3.2 Enhanced Classification Auto-encoder (CAE+) . ............................................................................................. 17
2.4 EvaluationResults ................................................................................................................................................ 19
2.4.1 ExperimentalSetup....................................................................................................................................... 19
2.4.2 Results........................................................................................................................................................ 23
3 Prevention of Backdoor Attacks through Differential Privacy in Practice ............................................................................32
3.1 ProblemDefinition ................................................................................................................................................ 32
3.2 Preliminaries and Backgrounds ............................................................................................................................. 33
3.2.1 BackdoorAttacks.......................................................................................................................................... 33
3.2.2 Differential Privacy and Label Differential Privacy . . . .................................................................................. . . . 34
3.2.3 DP and Label-DP in Deep Learning................................................................................................................. 35
3.2.4 Robustness of DP........................................................................................................................................... 38
3.3 ProposedApproach................................................................................................................................................. 40
3.4 Experiments and Results......................................................................................................................................... 42
3.4.1 ExperimentalSetup......................................................................................................................................... 42
3.4.2 ExperimentalRoadmap ................................................................................................................................... 45
3.4.3 DP against Backdoors ..................................................................................................................................... 45
3.4.4 Label DP against Backdoors............................................................................................................................. 50
3.4.5 Comparison of DP and Label-DP Methods . . . . . . . ...............................................................................................52
4 Interpretation Attacks on Interpretable Models with Electronic Health Records.................................................................... 59
4.1 ProblemDefinition ................................................................................................................................................... 59
4.2 Preliminaries and Related Work................................................................................................................................. 60
4.2.1 Attacks on Images via Model’s Gradients . . . ............................................................................................... . . . . . 60
4.2.2 Attack on EHRs via Medical Attention-based Models . . ..................................................................................... . . 61
4.3 ProposedApproach.................................................................................................................................................... 62
4.3.1 ProblemSetting................................................................................................................................................. 63
4.3.2 InterpretationAttackFormulation. . . . . . . ........................................................................................................... . 64
4.3.3 Optimization with Dynamic Penalty . . . . . ............................................................................................... . . . . . . . ...65
4.3.4 MinimizingDetectability..................................................................................................................................... 66
4.3.5 Metrics for Evaluation ....................................................................................................................................... 68
4.3.6 Robustness ........................................................................................................................................................ 71
4.4 Experiments............................................................................................................................................................... 73
4.4.1 ExperimentalSetup..............................................................................................................................................73
4.4.2 AttackPerformance............................................................................................................................................. 74
4.4.3 AttackDetectability ............................................................................................................................................. 76
4.4.4 Robustness ........................................................................................................................................................ 78
5 Conclusion.......................................................................................................................................................................... 80
5.1 Summary .................................................................................................................................................................... 80
5.2 FutureWork................................................................................................................................................................. 82
Bibliography 84
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Defensive Machine Learning Techniques for Countering Adversarial Attacks () | 2023-04-26 14:15:04 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|