Finetuned DNA Language Model Based- Classifiers Captures Significant Enzymatic Activity from Metagenomic Datasets Open Access

Zheng, Weiyang (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/bn999799g?locale=itMastersthesis
Published

Abstract

The surge of metagenomic sequencing data demands functional annotation methods that move beyond traditional homology-based approaches. In this study, we utilize REBEAN (Read Embedding-Based Enzyme Annotator), a fine-tuned DNA language model designed to predict enzymatic activity directly from raw nucleotide sequences, and developed two classifiers, REBEAN-Halo and REBEAN-Nitro, targeting halogenase and nitrogenase functions, respectively. REBEAN-Halo identified functionally important regions within known halogenases and detected 92 candidates of novel halogenases from marine metagenomes. REBEAN-Nitro, though undertrained, successfully distinguished higher nitrogenase activity in unfertilized agricultural soils relative to fertilized ones, aligning with ecological expectations. Both models highlight REBEAN's potential to uncover functionally relevant but sequence-divergent enzymes in complex metagenomic datasets, offering a powerful tool for advancing enzyme discovery and microbiome functional profiling.

Table of Contents

Introduction .................................................................................................................................. 1

Results and Discussion .................................................................................................................. 4

Methods .......................................................................................................................................12

Bibliography .................................................................................................................................14

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files