Towards Responsible Data Science with Behavior Change Interventions Public
Dong, Ziwei (Spring 2025)
Abstract
Data science holds immense potential for societal progress; yet, if unchecked, it can perpetuate harm and inequity. We have witnessed instances where data-driven systems mislabel individuals, leading to dehumanization and unequal access to essential resources. While the academic community has made strides in addressing these issues, the predominant focus has been on technical solutions around algorithmic fairness, often overlooking the people and systems involved. This thesis presents a novel approach to bridging this gap. It introduces three key elements: (1) the translational application of behavior change theories for promoting responsible data science practices, (2) a design space to scaffold the development of behavior change interventions in the data science context, and (3) the implementation and empirical assessment of behavior change interventions designed to meet the specific demands of responsible data science. This work extends beyond technical solutions to address the systemic issues at the core of responsible data science, presenting a series of works that ensure data science serves society responsibly.
Table of Contents
1 Introduction 1
1.1 Motivation 1
1.2 Summary of Thesis Research 5
1.3 Thesis Statement 5
1.4 Research Questions 6
2 Related Work 7
2.1 Data Science Process and Framework 7
2.1.1 Pre-processing 8
2.1.2 In-processing 9
2.1.3 Post-processing 10
2.2 Responsible Data Science 11
2.3 Theories of Behavior Change 12
3 Developing Theories for Responsible Data Science through Behavioral Change Interventions 14
3.1 Motivation 14
3.2 Identifying Relevant Theories of Behavior Change for Data Science 17
3.2.1 Factors Affecting Behavior Change (FBC) 20
3.2.2 Behavior Change Techniques (BCT) 22
3.2.3 Mechanisms of Action (MoA) 23
3.3 Responsible Data Science 23
3.3.1 Characterizing Agents and Outcomes of Responsible Data Science 24
3.3.2 Technically Satisfactory Practices for Responsible Data Science 25
3.3.3 Behaviorally Responsible Practices in Data Science 27
3.4 Operationalizing Behavior Change Theories for Responsible Data Science 31
3.5 Interventions 34
3.5.1 Interventions Designed for the Machine Learning Example 34
3.5.2 Interventions Designed for the Visual Data Analytics Example 37
3.5.3 Internal Reflection 39
3.6 Discussion 40
3.6.1 Challenge 1: Intervening at the Right Time 41
3.6.2 Challenge 2: Facilitating Lasting Behavior Change Through In-The-Moment Interventions 42
3.6.3 Challenge 3: Measuring Efficacy & Boosting Adoption 43
3.6.4 Challenge 4: Incentives Versus Consequences to Induce Behavior Change 44
3.6.5 Challenge 5: Automated Versus Behaviorally Responsible Data Science 45
3.6.6 Challenge 6: Enhancing Education and Training for Data Science Practitioners 45
3.7 Limitations 46
3.8 Summary 47
4 Synthesizing a Design Space of Behavior Change Interventions for Responsible Data Science 48
4.1 Motivation 48
4.2 Design Space Rationale 51
4.3 Behavioral Considerations 53
4.3.1 Why: Why do you as a designer want to intervene? 54
4.3.2 Who: Who is the target of the behavior change intervention? 56
4.3.3 What: What key objectives does the intervention seek to influence? 59
4.3.4 Usage Scenario: A State Government's COVID-19 Support Model 62
4.4 Implementation Considerations 65
4.4.1 When: When is the suitable time to intervene? 65
4.4.2 Where: Where do the interventions take place? 69
4.4.3 How: How can we design effective interventions? 69
4.4.4 Usage Scenario: A Professor's Intro to Responsible Data Science Course 73
4.5 Characterizing Existing Intervention Tools 76
4.5.1 Method 77
4.5.2 Results 78
4.6 Discussion 83
4.7 Limitations: 84
4.8 Summary 85
5 Developing a Behavior Change Intervention for Technical Responsibility in Data Science Pre-Processing 87
5.1 Motivation 87
5.2 Quantifying the Impact of Pre-Processing on Model Fairness 90
5.3 Design Approach 93
5.3.1 Design Process 93
5.3.2 Design Goals 93
5.4 Visual Analytic Interface 94
5.4.1 Overview of Strategies 96
5.4.2 Narrow Down the Search Space and Explain Options 97
5.4.3 Strategy Exploration and Comparison 100
5.5 Usage Scenarios 102
5.5.1 Searching with Prioritized Metrics 102
5.5.2 Strategy Brainstorming 105
5.6 Preliminary User Feedback 106
5.6.1 Participants 107
5.6.2 Method 108
5.6.3 Qualitative Findings 109
5.6.4 System Improvements 113
5.7 Discussion 113
5.8 Limitations 116
5.9 Summary 117
6 Evaluating Behavior Change Interventions for Responsible Data Science 119
6.1 Motivation 119
6.2 Methods 121
6.2.1 Tasks & Interventions 121
6.2.2 Participants 123
6.2.3 Procedure 123
6.2.4 Responsible Data Science Practices and Data Collection 124
6.2.5 Hypotheses 126
6.2.6 Measures 127
6.3 Results 128
6.3.1 H1: Responsible Behaviors 128
6.3.2 H2: COM-B Factors 130
6.3.3 H3: Model Fairness 131
6.3.4 H4: Model Performance 132
6.3.5 H5: Cognitive Load 133
6.3.6 Summary of Results 135
6.4 Discussion 136
6.5 Future Work 137
6.6 Limitations 138
6.7 Summary 140
7 Discussion 141
7.1 The Complementary Nature of Technical and Behavioral Approaches 141
7.2 Theoretical Translation Across Disciplines as a Methodological Innovation 142
7.3 Bridging Theory and Practice in Responsible Data Science 143
7.4 Balancing Individual and Systemic Approaches to Change 143
7.5 The Role of Visualization in Promoting Responsible Practices 144
7.6 Intervention Design Should Balance Technical Capability with Workflow Integration 145
7.7 The Tension Between Intervention Efficacy and Cognitive Load 146
7.8 Bridging the Gap Between Behavioral Change and Outcome Improvement 147
8 Conclusions 148
Bibliography 150
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
Towards Responsible Data Science with Behavior Change Interventions () | 2025-04-28 11:05:21 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|