Finetuned DNA Language Model Based- Classifiers Captures Significant Enzymatic Activity from Metagenomic Datasets Öffentlichkeit
Zheng, Weiyang (Spring 2025)
Abstract
The surge of metagenomic sequencing data demands functional annotation methods that move beyond traditional homology-based approaches. In this study, we utilize REBEAN (Read Embedding-Based Enzyme Annotator), a fine-tuned DNA language model designed to predict enzymatic activity directly from raw nucleotide sequences, and developed two classifiers, REBEAN-Halo and REBEAN-Nitro, targeting halogenase and nitrogenase functions, respectively. REBEAN-Halo identified functionally important regions within known halogenases and detected 92 candidates of novel halogenases from marine metagenomes. REBEAN-Nitro, though undertrained, successfully distinguished higher nitrogenase activity in unfertilized agricultural soils relative to fertilized ones, aligning with ecological expectations. Both models highlight REBEAN's potential to uncover functionally relevant but sequence-divergent enzymes in complex metagenomic datasets, offering a powerful tool for advancing enzyme discovery and microbiome functional profiling.
Table of Contents
Introduction .................................................................................................................................. 1
Results and Discussion .................................................................................................................. 4
Methods .......................................................................................................................................12
Bibliography .................................................................................................................................14
About this Honors Thesis
| School | |
|---|---|
| Department | |
| Degree | |
| Submission | |
| Language |
|
| Research Field | |
| Stichwort | |
| Committee Chair / Thesis Advisor | |
| Committee Members |
Primary PDF
| Thumbnail | Title | Date Uploaded | Actions |
|---|---|---|---|
|
|
Finetuned DNA Language Model Based- Classifiers Captures Significant Enzymatic Activity from Metagenomic Datasets () | 2025-04-21 15:42:53 -0400 |
|
Supplemental Files
| Thumbnail | Title | Date Uploaded | Actions |
|---|