magicslabnu
/

GenomeOcean-100M-finetuned-prom_300_tata

Text Classification

promoter-prediction

Model card Files Files and versions

GenomeOcean-100M-finetuned-prom_300_tata / README.md

robinzixuan's picture

Update README.md

d1bb974 verified 7 months ago

|

history blame contribute delete

2.36 kB

	---
	# IMPORTANT: Choose the correct license identifier from https://hf.co/docs/hub/repositories-licenses
	license: apache-2.0 # Or cc-by-sa-4.0, mit, etc. - CHOOSE THE CORRECT ONE

	# IMPORTANT: Choose the most accurate pipeline tag for your model's task.
	# See: https://huggingface.co/docs/hub/models-widgets#pipeline-types
	# Examples for genomics:
	# token-classification: If predicting labels for each base/token (e.g., is this base part of a TATA box?)
	# text-classification: If classifying the whole sequence (e.g., promoter vs. non-promoter)
	pipeline_tag: text-classification # <-- EDIT THIS BASED ON YOUR MODEL'S TASK

	tags:
	- pytorch
	- genomics
	- dna
	- promoter-prediction

	---

	# GenomeOcean-100M-finetuned-prom_300_tata

	## Model Description

	This repository contains the `GenomeOcean-100M-finetuned-prom_300_tata` model.
	It is a transformer model fine-tuned

	You can use this model with the following Python code. Make sure to use the AutoModelFor... class that matches your pipeline_tag (e.g., AutoModelForTokenClassification, AutoModelForSequenceClassification).
	```
	from transformers import AutoTokenizer, AutoModelForTokenClassification # <-- CHANGE AutoModel class if pipeline_tag is different

	model_id = "magicslabnu/GenomeOcean-100M-finetuned-prom_300_tata"

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(model_id)

	# Load model (Ensure the AutoModel class matches your task)
	model = AutoModelForTokenClassification.from_pretrained(model_id)

	# --- Inference Example ---
	# Prepare your DNA sequence(s)
	# Ensure sequence format matches what the tokenizer expects (e.g., spaces between bases if needed)
	dna_sequence = "[Your example DNA sequence here, e.g., 'A C G T A C G T']"

	# Tokenize the input
	inputs = tokenizer(dna_sequence, return_tensors="pt") # "pt" for PyTorch

	# Perform inference
	# For Token Classification:
	outputs = model(**inputs)
	predictions = outputs.logits.argmax(dim=-1)
	# You might need to map prediction IDs back to labels
	print("Token predictions:", predictions)

	# For Sequence Classification:
	# outputs = model(**inputs)
	# predictions = outputs.logits.softmax(dim=-1)
	# print("Sequence probabilities:", predictions)
	# -------------------------

	# [Add code here to interpret the predictions based on your specific task
	# e.g., mapping token IDs to labels like 'Promoter', 'Non-Promoter', 'TATA-box']
	````