transformers-implementation
This pr adds integration to this repo for the transformers library. With this, we can integrate this model and derivatives into the ecosystem. For example, javascript and c++ inference.
Usage
To test out the model on this branch follow these steps:
- Install transformers from the PR branch for nanochat integration:
pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
- You can then run this snippet by referencing the branch revision (
"refs/pr/1"):
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id="karpathy/nanochat-d32"
revision="refs/pr/1"
max_new_tokens=64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()
conversation = [
{"role": "user", "content": "What is the capital of France?"},
]
inputs = tokenizer.apply_chat_template(
conversation,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs, # Unpack the dictionary
max_new_tokens=args.max_new_tokens,
)
# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))
Or in vLLM, like so:
vllm serve karpathy/nanochat-d32 --enforce-eager --revision refs/pr/1
Next Steps
- update the repo readme with snippets for
nanochatandtransformers. - add
transformers.jsintegration @Xenova
I was also able to deploy a ZeroGPU demo space based on these weights and transformers implementation: https://huggingface.co/spaces/nanochat-students/chat-d32-demo
update snippet to match change in transformers branch
Amazing, thanks so much @burtenshaw !
One question: how did you converted the tiktoken tokenizer to a tokenizer.json/HF Fast tokenizer? I have already trained a German nanochat base model, and would highly try out my model with the Transformers PR :)
Hi @stefan-it 👋 I did the tokenizer conversion, and while the script is in the HF PR, you can do it separately with something like:
!git clone https://github.com/karpathy/nanochat.git
%cd nanochat
!pip install -e .
!wget https://huggingface.co/burtenshaw/nanochat-d20/resolve/main/tokenizer.pkl
from nanochat.tokenizer import RustBPETokenizer
tok = RustBPETokenizer.from_directory(".")
from transformers.integrations.tiktoken import convert_tiktoken_to_fast
from pathlib import Path
output_dir = Path("hf-tokenizer")
output_dir.mkdir(exist_ok=True)
convert_tiktoken_to_fast(tok.enc, output_dir)