Simultaneous Dimensionality Reduction for Extracting Useful Representations of Large Empirical Multimodal Datasets Public

Abdelaleem, Eslam (Summer 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/hd76s156x?locale=fr
Published

Abstract

The quest for simplification in physics drives the exploration of concise mathematical representations for complex systems. This Dissertation focuses on the concept of dimensionality reduction as a means to obtain low-dimensional descriptions from high-dimensional data, facilitating comprehension and analysis. We address the challenges posed by real-world data that defy conventional assumptions, such as the complex interactions within neural systems or high-dimensional dynamical systems. Leveraging insights from both theoretical physics and machine learning, this work unifies diverse dimensionality reduction methods under a comprehensive framework, the Deep Variational Multivariate Information Bottleneck. This framework enables the design of tailored reduction algorithms based on specific research questions and data characteristics. We explore and assert the efficacy of simultaneous dimensionality reduction approaches over their independent reduction counterparts, demonstrating their superiority in capturing covariation between multiple modalities, while requiring less data. We also introduced novel techniques, such as the Deep Variational Symmetric Information Bottleneck, for general nonlinear simultaneous dimensionality reduction. We show that the same principle of simultaneous reduction is the key to efficient and precise estimation of mutual information, a fundamental measure of statistical dependencies. We show that our new method is able to discover the coordinates of high dimensional observations of dynamical systems. Through analytical investigations and empirical validations, we shed light on the intricacies of dimensionality reduction methods, paving the way for enhanced data analysis across various domains. We underscore the potential of these methodologies to extract meaningful insights from complex datasets, driving advancements in fundamental research and applied sciences. As these methods evolve and find broader applications, they promise to deepen our understanding of complex systems and inform more effective experimental design and data analysis strategies.

Table of Contents

  1 Introduction: From Complexity to Sufficient Simplicity

   

  2 On the Difference Between Independent and Simultaneous Dimensionality Reduction

    2.1 Summary

    2.2 Introduction

    2.3 Model

      2.3.1 Relations to Previous Work

      2.3.2 Linear Model with Self and Shared Signals

    2.4 Methods

      2.4.1 Linear Dimensionality Reduction Methods

      2.4.2 Assessing Success and Sampling Noise Treatment

      2.4.3 Implementation

    2.5 Results

      2.5.1 Results of the Linear Model

      2.5.2 One Self-Signal in X and Y in Addition to the Shared Signal (mself = 1)

      2.5.3 Many Self-Signals in X and Y in Addition to the Shared Signal (mself = 30)

      2.5.4 Key Parameters and Testing Technique for Dimensionality of Self and Shared Signals

      2.5.5 Beyond Linear Models - Noisy MNIST

    2.6 Discussions

      2.6.1 Extensions and Generalizations

      2.6.2 Explaining Observations in the Literature

      2.6.3 Is SDR Strictly Effective in Low Sampling Situations?

      2.6.4 Diagnostic Test for Number of Latent Signals

      2.6.5 Limitations and Future Work

      2.6.6 Conclusion

    2.7 Limitations and Future Work

     

  3 Deep Variational Multivariate Information Bottleneck Framework

    3.1 Summary

    3.2 Introduction

    3.3 Multivariate Information Bottleneck Framework

      3.3.1 Deep Variational Symmetric Information Bottleneck

      3.3.2 Variational Bounds on DVSIB Encoder Terms

      3.3.3 Variational Bounds on DVSIB Decoder Terms

      3.3.4 Variational Bounds on Decoder Terms Not on a Leaf - MINE

      3.3.5 Parameterizing the Distributions and the Reparameterization Trick

    3.4 Deriving Other DR Methods

    3.5 Results

    3.6 Conclusion

    3.7 Supplementary Information

      3.7.1 Deriving and Designing Variational Losses

      3.7.2 Multi-Variable Losses (More than 2 Views/Variables)

      3.7.3 Multi-View Information Bottleneck

      3.7.4 Additional MNIST Results

      3.7.5 DVSIB-Private Reconstructions at 2 Latent Dimensions

       

  4 Efficient Estimation of Mutual Information in Very Large Dimensional Data

    4.1 Summary

    4.2 Introduction

    4.3 Background and Previous Work

      4.3.1 Estimation of Mutual Information

      4.3.2 Overview of NN Information Estimators

      4.3.3 Problems of NN Mutual Information Estimators

      4.3.4 MI Estimation as a DR Problem

    4.4 Results

      4.4.1 Infinite Data

      4.4.2 Finite Data

    4.5 Discussion and MI Estimation Guidelines

      4.5.1 Guidelines for MI Estimation from High-Dimensional Data

      4.5.2 Discussion

    4.6 Supplemental Information

      4.6.1 Neural Networks Architecture and Technical Details

      4.6.2 Supplemental Figures

       

  5 Discussions

    5.1 On Linear DR Methods

    5.2 On DVMIB

    5.3 On Efficient Estimation of Mutual Information

    5.4 On Discovering Coordinates of Dynamics

      5.4.1 The Setup

      5.4.2 DVSIB for Dynamics

      5.4.3 Preliminary Results

      5.4.4 Additional Questions and Remarks

      5.4.5 Supplementary Information

    5.5 On Neural Activity and Behavior

    5.6 Final Thoughts

     

  Bibliography

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files