eccDNAMamba

A Pre-Trained Model for Ultra-Long eccDNA Sequence Analysis


Model Overview

eccDNAMamba is a bidirectional state-space model (SSM) designed for efficient and topology-aware modeling of extrachromosomal circular DNA (eccDNA).
By combining forward and reverse Mamba-2 encoders, motif-level Byte Pair Encoding (BPE), and a lightweight head–tail circular augmentation, it captures wrap-around dependencies in ultra-long (10–200 kbp) genomic sequences while maintaining linear-time scalability.
The model provides strong performance across cancer-associated eccDNA prediction, copy-number level estimation, and real vs. pseudo-eccDNA discrimination tasks.


Quick Start

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("eccdna/eccDNAMamba-1M")
model = AutoModelForMaskedLM.from_pretrained("eccdna/eccDNAMamba-1M")

sequence = "ATGCGTACGTTAGCGTACGT"
inputs = tokenizer(sequence, return_tensors="pt")
outputs = model(**inputs)

# Access logits or reconstruct masked spans
logits = outputs.logits

Citation

@inproceedings{
liu2025eccdnamamba,
title={ecc{DNAM}amba: A Pre-Trained Model for Ultra-Long ecc{DNA} Sequence Analysis},
author={Zhenke Liu and Jien Li and Ziqi Zhang},
booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop},
year={2025},
url={https://openreview.net/forum?id=56xKN7KJjy}
}
Downloads last month
19
Safetensors
Model size
0.5B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for eccDNAMamba/eccDNAMamba_eccdna_copy_number_level_prediction_threshold_4.5

Finetuned
(7)
this model