Biomarker Discovery from Sparse-Labeled Electrophysiological Datasets Restricted; Files Only
Zeydabadinezhad, Mahmoud (Fall 2024)
Abstract
The widespread accessibility of diverse neuroimaging modalities—ranging from Local Field Potential (LFP) and Electroencephalography (EEG) signals to functional MRI (fMRI)— and the advances in Artificial Intelligence (AI) and machine learning (ML) methodologies have opened unprecedented avenues for investigating the neural patterns underlying sensorimotor and cognitive processes. These patterns of neural activities that are called physiomarkers or biomarkers are crucial for understanding the neural mechanisms of diseases, developing novel therapeutic interventions such as closed-loop neuromodulation, and studying the mechanisms of action of those treatments. Despite these technological advances, the applicability of contemporary AI/ML approaches is limited because there are often not enough examples in the labeled datasets. This sample size limitation arises from various factors such as ethical considerations in data collection, financial constraints, and the limited size of patient populations. Concurrently, there is an urgent need for models that not only perform well but are also explainable, particularly for the identification of neural biomarkers.
While machine learning methodologies for biomarker identification in neural activity have been extensively studied, the focus has predominantly been on large labeled datasets. The issue of explainability is often relegated to a secondary concern. Research in other settings, such as computer vision, has ventured into methods tailored for small sample sizes, but these approaches seldom offer a balance between performance and explainability. Moreover, the applicability of these methods to neural activity data is uncharted territory.
Addressing this gap is of paramount importance for several compelling reasons: The challenge is endemic in neuroscience, affecting a multitude of studies that operate under the constraints of limited sample sizes.
The current limitations hinder the application of advanced automated data representation learning methods to neuroscience and have far-reaching implications for clinical applications.
The development of an explainable automated data representation framework, tailored for limited sample sizes, stands to make a seminal contribution to neuroscience. Such a framework would not only facilitate biomarker identification but also enrich our understanding of neural activity. Building on this premise, our research specifically targets EEG data, intending to develop an automated and explainable data representation method for the critical task of quantifying the physiological effects of electrical neuromodulation.
We hypothesize that the development of an explainable foundation model, tailored for EEG data analysis, will significantly enhance the quantification of physiological effects from small and heterogeneous EEG datasets. This model will surpass the limitations of current machine learning and deep learning methodologies by providing a robust, generalizable solution that is capable of interpreting complex biological signals without the need for extensive labeled data. To test our hypothesis, we develop analysis pipelines for biomarker identification from small and heterogenous data using manual feature extraction and traditional machine learning models within the context of electrogastrography (EGG), and electroencephalography (EEG). Subsequently, we adapt a foundation model to automatically generate EEG data representations, tailored for a memory classification task.
Table of Contents
Contents
1 Introduction 1
1.1 Regularization/Model Complexity . . . . . . . . . . . . . . . . . . . . 2
1.1.1 L1 Regularization (LASSO) . . . . . . . . . . . . . . . . . . . 2
1.1.2 L2 Regularization (Ridge Regression) . . . . . . . . . . . . . . 3
1.1.3 Dropout regularization . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Lower Complexity Models . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Data Manipulation . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Generative Models . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Biophysical Modeling . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.4 Federated Learning . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Self Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.1 Predictive Methods . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.2 Contrastive Methods . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.3 Reconstructive Methods . . . . . . . . . . . . . . . . . . . . . 9
2 Discovery of Electrogastrography Biomarkers Under Label Con-
straints 11
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
i
2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Feature engineering . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Feature selection . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.5 Model Selection/Training . . . . . . . . . . . . . . . . . . . . 23
2.2.6 Model evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Seizure Detection from In-the-Ear EEG Recordings 34
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.1 Earbud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.3 Electrodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.4 Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.5 Data Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2.6 Manual Data Annotation . . . . . . . . . . . . . . . . . . . . . 44
3.2.7 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.8 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2.9 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.10 Model Training and Evaluation . . . . . . . . . . . . . . . . . 48
3.2.11 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2.12 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Neural Biomarkers of Memory: A Sparse-Label SEEG Analysis 58
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.1 Study Participants . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2.2 Intracranial Electrophysiology . . . . . . . . . . . . . . . . . . 60
4.2.3 Experimental Design and Stimulation . . . . . . . . . . . . . . 60
4.2.4 Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.5 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . 61
4.2.6 Predictive Models . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.2.7 Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Adapting Foundation Models for EEG Data Representation 67
5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 The Importance of Foundation Models for EEG Data . . . . . . . . . 68
5.3 Bridging the Gap: From Manual Extraction to Automated Representation 69
5.4 BERT-inspired Neural Data Representations . . . . . . . . . . . . . . 70
5.4.1 Application to EEG . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.2 Pre-training Dataset . . . . . . . . . . . . . . . . . . . . . . . 74
5.4.3 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.4.4 Pre-training Procedure . . . . . . . . . . . . . . . . . . . . . . 76
5.4.5 Fine-tunning Procedure . . . . . . . . . . . . . . . . . . . . . 78
5.4.6 Evaluation Procedure . . . . . . . . . . . . . . . . . . . . . . . 79
5.5 Adapting a Foundation Model for SEEG Data Representation and
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.5.1 Pre-training Datasets . . . . . . . . . . . . . . . . . . . . . . . 80
5.5.2 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.3 EEG Patching . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.4 Temporal Encoding . . . . . . . . . . . . . . . . . . . . . . . . 82
5.5.5 Neural Tokenizer . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5.6 Pre-training Procedure . . . . . . . . . . . . . . . . . . . . . . 86
5.5.7 Fine-tuning Procedure . . . . . . . . . . . . . . . . . . . . . . 87
5.5.8 Linear Probing vs. Latent Space Classification . . . . . . . . . 89
5.5.9 Enhancing Linear Probing . . . . . . . . . . . . . . . . . . . . 90
6 Conclusion and Future Work 97
A Appendix 101
A.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
A.2 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A.2.1 Inference Without Fine-Tuning . . . . . . . . . . . . . . . . . 103
Bibliography 105
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |

Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
![]() |
File download under embargo until 09 January 2026 | 2024-12-05 13:31:33 -0500 | File download under embargo until 09 January 2026 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|