|
|
--- |
|
|
|
|
|
license: apache-2.0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
pipeline_tag: text-classification |
|
|
|
|
|
tags: |
|
|
- pytorch |
|
|
- genomics |
|
|
- dna |
|
|
- promoter-prediction |
|
|
|
|
|
--- |
|
|
|
|
|
# GenomeOcean-100M-finetuned-prom_300_tata |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This repository contains the `GenomeOcean-100M-finetuned-prom_300_tata` model. |
|
|
It is a transformer model fine-tuned |
|
|
|
|
|
You can use this model with the following Python code. Make sure to use the AutoModelFor... class that matches your pipeline_tag (e.g., AutoModelForTokenClassification, AutoModelForSequenceClassification). |
|
|
``` |
|
|
from transformers import AutoTokenizer, AutoModelForTokenClassification # <-- CHANGE AutoModel class if pipeline_tag is different |
|
|
|
|
|
model_id = "magicslabnu/GenomeOcean-100M-finetuned-prom_300_tata" |
|
|
|
|
|
# Load tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
|
|
# Load model (Ensure the AutoModel class matches your task) |
|
|
model = AutoModelForTokenClassification.from_pretrained(model_id) |
|
|
|
|
|
# --- Inference Example --- |
|
|
# Prepare your DNA sequence(s) |
|
|
# Ensure sequence format matches what the tokenizer expects (e.g., spaces between bases if needed) |
|
|
dna_sequence = "[Your example DNA sequence here, e.g., 'A C G T A C G T']" |
|
|
|
|
|
# Tokenize the input |
|
|
inputs = tokenizer(dna_sequence, return_tensors="pt") # "pt" for PyTorch |
|
|
|
|
|
# Perform inference |
|
|
# For Token Classification: |
|
|
outputs = model(**inputs) |
|
|
predictions = outputs.logits.argmax(dim=-1) |
|
|
# You might need to map prediction IDs back to labels |
|
|
print("Token predictions:", predictions) |
|
|
|
|
|
# For Sequence Classification: |
|
|
# outputs = model(**inputs) |
|
|
# predictions = outputs.logits.softmax(dim=-1) |
|
|
# print("Sequence probabilities:", predictions) |
|
|
# ------------------------- |
|
|
|
|
|
# [Add code here to interpret the predictions based on your specific task |
|
|
# e.g., mapping token IDs to labels like 'Promoter', 'Non-Promoter', 'TATA-box'] |
|
|
```` |