Genos

Genos, as a foundational model in the field of human genomics, trained on hundreds of high-quality genome reference data, has achieved the ability to contextually model human genome sequences up to millions of base pairs. Through single-base resolution learning, this model possesses the capability to identify hidden deep sequence patterns and functional features within genomes, providing scientists with a new research method that connects genetic information with life activities.

For instructions, details, and examples, please refer to the Genos GitHub.

Below are the data volume of our model training and related parameters.

Model Specification Genos 1.2B Genos 10B
Model Scale
Total Parameters 1.2B 10B
Activated Parameters 0.33B 2.87B
Trained Tokens 1600 B 2200 B
Architecture
Architecture Type MoE MoE
Number of Experts 8 8
Selected Experts per Token 2 2
Number of Layers 12 12
Attention Hidden Dimension 1024 4096
Number of Attention Heads 16 16
MoE Hidden Dimension (per Expert) 4096 8192
Vocabulary Size 128 (padded) 256 (padded)
Context Length up to 1M up to 1M

Genos 1.2B and 10B checkpoints are available here:

We also provide checkpoints trained under the Megatron-LM framework:

Downloads last month
105
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including BGI-HangzhouAI/Genos-1.2B