File size: 1,772 Bytes
18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad 18f3dac 2aae7ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
library_name: transformers
tags: []
---
# Hymba2-2.7B-Instruct
Hymba2 is a new hybrid SLM model family that outperforms Qwen models in accuracy (math, coding, and commonsense), batch-size-1 latency, and throughput. More details are in our NeurIPS 2025 [paper](https://drive.google.com/drive/folders/17vOGktwUfUpRAJPGJUV6oX8XwLSczZtv?usp=sharing).
Docker path: `/lustre/fsw/portfolios/nvr/users/yongganf/docker/megatron_py25_fla.sqsh` on ORD/NRT or `/lustre/fsw/nvr_lpr_llm/yongganf/docker/megatron_py25_fla.sqsh` on EOS.
## Chat with Hymba2-2.7B-Instruct
We wrap the model into CUDA Graph for fast generation:
```
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
repo_name = "nvidia/Hymba2-2.7B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(repo_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_name, trust_remote_code=True)
model = model.cuda().to(torch.bfloat16)
max_new_tokens = 256
print('Initializing generation state...')
generation_state = model.init_cuda_graph_generation(
max_new_tokens=max_new_tokens,
batch_size=1,
device='cuda',
)
while True:
prompt = input("User:")
if prompt.lower() == "exit":
break
inputs = tokenizer(prompt, return_tensors="pt").to('cuda')
print(f"Generating with CUDA graph acceleration...")
outputs = model.generate_with_cuda_graph(
input_ids=inputs["input_ids"],
generation_state=generation_state,
max_new_tokens=max_new_tokens,
temperature=0,
top_k=50,
eos_token_id=tokenizer.eos_token_id,
profiling=False,
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Response: {response}")
``` |