Biomarker Discovery from Sparse-Labeled Electrophysiological Datasets Restricted; Files Only

Zeydabadinezhad, Mahmoud (Fall 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/2f75r969z?locale=pt-BR

Published

Abstract

The widespread accessibility of diverse neuroimaging modalities—ranging from Local Field Potential (LFP) and Electroencephalography (EEG) signals to functional MRI (fMRI)— and the advances in Artificial Intelligence (AI) and machine learning (ML) methodologies have opened unprecedented avenues for investigating the neural patterns underlying sensorimotor and cognitive processes. These patterns of neural activities that are called physiomarkers or biomarkers are crucial for understanding the neural mechanisms of diseases, developing novel therapeutic interventions such as closed-loop neuromodulation, and studying the mechanisms of action of those treatments. Despite these technological advances, the applicability of contemporary AI/ML approaches is limited because there are often not enough examples in the labeled datasets. This sample size limitation arises from various factors such as ethical considerations in data collection, financial constraints, and the limited size of patient populations. Concurrently, there is an urgent need for models that not only perform well but are also explainable, particularly for the identification of neural biomarkers.

While machine learning methodologies for biomarker identification in neural activity have been extensively studied, the focus has predominantly been on large labeled datasets. The issue of explainability is often relegated to a secondary concern. Research in other settings, such as computer vision, has ventured into methods tailored for small sample sizes, but these approaches seldom offer a balance between performance and explainability. Moreover, the applicability of these methods to neural activity data is uncharted territory.

Addressing this gap is of paramount importance for several compelling reasons: The challenge is endemic in neuroscience, affecting a multitude of studies that operate under the constraints of limited sample sizes.

The current limitations hinder the application of advanced automated data representation learning methods to neuroscience and have far-reaching implications for clinical applications.

The development of an explainable automated data representation framework, tailored for limited sample sizes, stands to make a seminal contribution to neuroscience. Such a framework would not only facilitate biomarker identification but also enrich our understanding of neural activity. Building on this premise, our research specifically targets EEG data, intending to develop an automated and explainable data representation method for the critical task of quantifying the physiological effects of electrical neuromodulation.

We hypothesize that the development of an explainable foundation model, tailored for EEG data analysis, will significantly enhance the quantification of physiological effects from small and heterogeneous EEG datasets. This model will surpass the limitations of current machine learning and deep learning methodologies by providing a robust, generalizable solution that is capable of interpreting complex biological signals without the need for extensive labeled data. To test our hypothesis, we develop analysis pipelines for biomarker identification from small and heterogenous data using manual feature extraction and traditional machine learning models within the context of electrogastrography (EGG), and electroencephalography (EEG). Subsequently, we adapt a foundation model to automatically generate EEG data representations, tailored for a memory classification task.

Contents

1 Introduction 1

1.1 Regularization/Model Complexity . . . . . . . . . . . . . . . . . . . . 2

1.1.1 L1 Regularization (LASSO) . . . . . . . . . . . . . . . . . . . 2

1.1.2 L2 Regularization (Ridge Regression) . . . . . . . . . . . . . . 3

1.1.3 Dropout regularization . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Lower Complexity Models . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.2 Generative Models . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.3 Biophysical Modeling . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.4 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 Self Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.1 Predictive Methods . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.2 Contrastive Methods . . . . . . . . . . . . . . . . . . . . . . . 8

1.5.3 Reconstructive Methods . . . . . . . . . . . . . . . . . . . . . 9

2 Discovery of Electrogastrography Biomarkers Under Label Con-

straints 11

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.3 Feature engineering . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.5 Model Selection/Training . . . . . . . . . . . . . . . . . . . . 23

2.2.6 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Seizure Detection from In-the-Ear EEG Recordings 34

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.1 Earbud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.3 Electrodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.4 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2.5 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2.6 Manual Data Annotation . . . . . . . . . . . . . . . . . . . . . 44

3.2.7 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.8 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.9 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.10 Model Training and Evaluation . . . . . . . . . . . . . . . . . 48

3.2.11 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4 Neural Biomarkers of Memory: A Sparse-Label SEEG Analysis 58

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.1 Study Participants . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2.2 Intracranial Electrophysiology . . . . . . . . . . . . . . . . . . 60

4.2.3 Experimental Design and Stimulation . . . . . . . . . . . . . . 60

4.2.4 Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.5 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . 61

4.2.6 Predictive Models . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.7 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5 Adapting Foundation Models for EEG Data Representation 67

5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 The Importance of Foundation Models for EEG Data . . . . . . . . . 68

5.3 Bridging the Gap: From Manual Extraction to Automated Representation 69

5.4 BERT-inspired Neural Data Representations . . . . . . . . . . . . . . 70

5.4.1 Application to EEG . . . . . . . . . . . . . . . . . . . . . . . . 71

5.4.2 Pre-training Dataset . . . . . . . . . . . . . . . . . . . . . . . 74

5.4.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.4.4 Pre-training Procedure . . . . . . . . . . . . . . . . . . . . . . 76

5.4.5 Fine-tunning Procedure . . . . . . . . . . . . . . . . . . . . . 78

5.4.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . 79

5.5 Adapting a Foundation Model for SEEG Data Representation and

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.5.1 Pre-training Datasets . . . . . . . . . . . . . . . . . . . . . . . 80

5.5.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.5.3 EEG Patching . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.5.4 Temporal Encoding . . . . . . . . . . . . . . . . . . . . . . . . 82

5.5.5 Neural Tokenizer . . . . . . . . . . . . . . . . . . . . . . . . . 84

5.5.6 Pre-training Procedure . . . . . . . . . . . . . . . . . . . . . . 86

5.5.7 Fine-tuning Procedure . . . . . . . . . . . . . . . . . . . . . . 87

5.5.8 Linear Probing vs. Latent Space Classification . . . . . . . . . 89

5.5.9 Enhancing Linear Probing . . . . . . . . . . . . . . . . . . . . 90

6 Conclusion and Future Work 97

A Appendix 101

A.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

A.2 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

A.2.1 Inference Without Fine-Tuning . . . . . . . . . . . . . . . . . 103

Bibliography 105

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Computer Science and Informatics
Degree	Ph.D.
Submission	Dissertation
Language	English
Research Field	Biology, Bioinformatics Computer Science
Palavra-chave	Machine Learning Foundation Model Neuromodulation Biomarker
Committee Chair / Thesis Advisor	Babak Mahmoudi, PhD, Emory University
Committee Members	Kamran Paynabar, PhD, Georgia Institute of Technology Reza Sameni, PhD, Emory University Matthew Reyna, PhD, Emory University Ali Bahrami Rad, PhD, Emory University

Última modificação

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	File download under embargo until 09 January 2026	2024-12-05 13:31:33 -0500	File download under embargo until 09 January 2026

Biomarker Discovery from Sparse-Labeled Electrophysiological Datasets Restricted; Files Only

Zeydabadinezhad, Mahmoud (Fall 2024)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files