--- # IMPORTANT: Choose the correct license identifier from https://hf.co/docs/hub/repositories-licenses license: apache-2.0 # Or cc-by-sa-4.0, mit, etc. - CHOOSE THE CORRECT ONE # IMPORTANT: Choose the most accurate pipeline tag for your model's task. # See: https://huggingface.co/docs/hub/models-widgets#pipeline-types # Examples for genomics: # token-classification: If predicting labels for each base/token (e.g., is this base part of a TATA box?) # text-classification: If classifying the whole sequence (e.g., promoter vs. non-promoter) pipeline_tag: text-classification # <-- EDIT THIS BASED ON YOUR MODEL'S TASK tags: - pytorch - genomics - dna - promoter-prediction --- # GenomeOcean-100M-finetuned-prom_300_tata ## Model Description This repository contains the `GenomeOcean-100M-finetuned-prom_300_tata` model. It is a transformer model fine-tuned You can use this model with the following Python code. Make sure to use the AutoModelFor... class that matches your pipeline_tag (e.g., AutoModelForTokenClassification, AutoModelForSequenceClassification). ``` from transformers import AutoTokenizer, AutoModelForTokenClassification # <-- CHANGE AutoModel class if pipeline_tag is different model_id = "magicslabnu/GenomeOcean-100M-finetuned-prom_300_tata" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) # Load model (Ensure the AutoModel class matches your task) model = AutoModelForTokenClassification.from_pretrained(model_id) # --- Inference Example --- # Prepare your DNA sequence(s) # Ensure sequence format matches what the tokenizer expects (e.g., spaces between bases if needed) dna_sequence = "[Your example DNA sequence here, e.g., 'A C G T A C G T']" # Tokenize the input inputs = tokenizer(dna_sequence, return_tensors="pt") # "pt" for PyTorch # Perform inference # For Token Classification: outputs = model(**inputs) predictions = outputs.logits.argmax(dim=-1) # You might need to map prediction IDs back to labels print("Token predictions:", predictions) # For Sequence Classification: # outputs = model(**inputs) # predictions = outputs.logits.softmax(dim=-1) # print("Sequence probabilities:", predictions) # ------------------------- # [Add code here to interpret the predictions based on your specific task # e.g., mapping token IDs to labels like 'Promoter', 'Non-Promoter', 'TATA-box'] ````