Towards Responsible Data Science with Behavior Change Interventions Public

Dong, Ziwei (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/n583xw717?locale=fr
Published

Abstract

Data science holds immense potential for societal progress; yet, if unchecked, it can perpetuate harm and inequity. We have witnessed instances where data-driven systems mislabel individuals, leading to dehumanization and unequal access to essential resources. While the academic community has made strides in addressing these issues, the predominant focus has been on technical solutions around algorithmic fairness, often overlooking the people and systems involved. This thesis presents a novel approach to bridging this gap. It introduces three key elements: (1) the translational application of behavior change theories for promoting responsible data science practices, (2) a design space to scaffold the development of behavior change interventions in the data science context, and (3) the implementation and empirical assessment of behavior change interventions designed to meet the specific demands of responsible data science. This work extends beyond technical solutions to address the systemic issues at the core of responsible data science, presenting a series of works that ensure data science serves society responsibly.

Table of Contents

1 Introduction                           1

  1.1 Motivation                          1

  1.2 Summary of Thesis Research                  5

  1.3 Thesis Statement                       5

  1.4 Research Questions                      6

2 Related Work                           7

  2.1 Data Science Process and Framework              7

   2.1.1 Pre-processing                     8

   2.1.2 In-processing                      9

   2.1.3 Post-processing                     10

  2.2 Responsible Data Science                   11

  2.3 Theories of Behavior Change                 12

3 Developing Theories for Responsible Data Science through Behavioral Change Interventions 14

  3.1 Motivation                          14

  3.2 Identifying Relevant Theories of Behavior Change for Data Science 17

   3.2.1 Factors Affecting Behavior Change (FBC)         20

   3.2.2 Behavior Change Techniques (BCT)            22

   3.2.3 Mechanisms of Action (MoA)               23

  3.3 Responsible Data Science                   23

   3.3.1 Characterizing Agents and Outcomes of Responsible Data Science 24

   3.3.2 Technically Satisfactory Practices for Responsible Data Science 25

   3.3.3 Behaviorally Responsible Practices in Data Science   27

  3.4 Operationalizing Behavior Change Theories for Responsible Data Science 31

  3.5 Interventions                        34

   3.5.1 Interventions Designed for the Machine Learning Example 34

   3.5.2 Interventions Designed for the Visual Data Analytics Example 37

   3.5.3 Internal Reflection                   39

  3.6 Discussion                          40

   3.6.1 Challenge 1: Intervening at the Right Time       41

   3.6.2 Challenge 2: Facilitating Lasting Behavior Change Through In-The-Moment Interventions 42

   3.6.3 Challenge 3: Measuring Efficacy & Boosting Adoption   43

   3.6.4 Challenge 4: Incentives Versus Consequences to Induce Behavior Change 44

   3.6.5 Challenge 5: Automated Versus Behaviorally Responsible Data Science 45

   3.6.6 Challenge 6: Enhancing Education and Training for Data Science Practitioners 45

  3.7 Limitations                         46

  3.8 Summary                           47

4 Synthesizing a Design Space of Behavior Change Interventions for Responsible Data Science 48

  4.1 Motivation                          48

  4.2 Design Space Rationale                    51

  4.3 Behavioral Considerations                  53

   4.3.1 Why: Why do you as a designer want to intervene?    54

   4.3.2 Who: Who is the target of the behavior change intervention? 56

   4.3.3 What: What key objectives does the intervention seek to influence? 59

   4.3.4 Usage Scenario: A State Government's COVID-19 Support Model 62

  4.4 Implementation Considerations                65

   4.4.1 When: When is the suitable time to intervene?      65

   4.4.2 Where: Where do the interventions take place?      69

   4.4.3 How: How can we design effective interventions?     69

   4.4.4 Usage Scenario: A Professor's Intro to Responsible Data Science Course 73

  4.5 Characterizing Existing Intervention Tools          76

   4.5.1 Method                         77

   4.5.2 Results                         78

  4.6 Discussion                          83

  4.7 Limitations:                         84

  4.8 Summary                           85

5 Developing a Behavior Change Intervention for Technical Responsibility in Data Science Pre-Processing 87

  5.1 Motivation                          87

  5.2 Quantifying the Impact of Pre-Processing on Model Fairness  90

  5.3 Design Approach                       93

   5.3.1 Design Process                     93

   5.3.2 Design Goals                      93

  5.4 Visual Analytic Interface                  94

   5.4.1 Overview of Strategies                 96

   5.4.2 Narrow Down the Search Space and Explain Options    97

   5.4.3 Strategy Exploration and Comparison           100

  5.5 Usage Scenarios                       102

   5.5.1 Searching with Prioritized Metrics           102

   5.5.2 Strategy Brainstorming                 105

  5.6 Preliminary User Feedback                  106

   5.6.1 Participants                      107

   5.6.2 Method                         108

   5.6.3 Qualitative Findings                  109

   5.6.4 System Improvements                   113

  5.7 Discussion                          113

  5.8 Limitations                         116

  5.9 Summary                           117

6 Evaluating Behavior Change Interventions for Responsible Data Science 119

  6.1 Motivation                          119

  6.2 Methods                           121

   6.2.1 Tasks & Interventions                  121

   6.2.2 Participants                      123

   6.2.3 Procedure                        123

   6.2.4 Responsible Data Science Practices and Data Collection  124

   6.2.5 Hypotheses                       126

   6.2.6 Measures                        127

  6.3 Results                           128

   6.3.1 H1: Responsible Behaviors                128

   6.3.2 H2: COM-B Factors                    130

   6.3.3 H3: Model Fairness                   131

   6.3.4 H4: Model Performance                  132

   6.3.5 H5: Cognitive Load                   133

   6.3.6 Summary of Results                   135

  6.4 Discussion                          136

  6.5 Future Work                         137

  6.6 Limitations                         138

  6.7 Summary                           140

7 Discussion                            141

  7.1 The Complementary Nature of Technical and Behavioral Approaches 141

  7.2 Theoretical Translation Across Disciplines as a Methodological Innovation 142

  7.3 Bridging Theory and Practice in Responsible Data Science   143

  7.4 Balancing Individual and Systemic Approaches to Change    143

  7.5 The Role of Visualization in Promoting Responsible Practices 144

  7.6 Intervention Design Should Balance Technical Capability with Workflow Integration 145

  7.7 The Tension Between Intervention Efficacy and Cognitive Load 146

  7.8 Bridging the Gap Between Behavioral Change and Outcome Improvement 147

8 Conclusions                            148

Bibliography                            150

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files