transformers-implementation

#1
by burtenshaw HF Staff - opened

This pr adds integration to this repo for the transformers library. With this, we can integrate this model and derivatives into the ecosystem. For example, javascript and c++ inference.

Usage

To test out the model on this branch follow these steps:

  1. Install transformers from the PR branch for nanochat integration:
pip install git+https://github.com/huggingface/transformers.git@nanochat-implementation
  1. You can then run this snippet by referencing the branch revision ("refs/pr/1"):
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


model_id="karpathy/nanochat-d32"
revision="refs/pr/1"
max_new_tokens=64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=False, revision=revision)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=False, dtype=torch.bfloat16, revision=revision).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "What is the capital of France?"},
]

inputs = tokenizer.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt"
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,  # Unpack the dictionary
        max_new_tokens=args.max_new_tokens,
    )

# Decode only the generated tokens (excluding the input prompt)
generated_tokens = outputs[0, inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated_tokens, skip_special_tokens=True))

Or in vLLM, like so:

vllm serve karpathy/nanochat-d32 --enforce-eager --revision refs/pr/1

Next Steps

  • update the repo readme with snippets for nanochat and transformers.
  • add transformers.js integration @Xenova
burtenshaw changed pull request status to open

I was also able to deploy a ZeroGPU demo space based on these weights and transformers implementation: https://huggingface.co/spaces/nanochat-students/chat-d32-demo

update snippet to match change in transformers branch

Amazing, thanks so much @burtenshaw !

One question: how did you converted the tiktoken tokenizer to a tokenizer.json/HF Fast tokenizer? I have already trained a German nanochat base model, and would highly try out my model with the Transformers PR :)

Hi @stefan-it 👋 I did the tokenizer conversion, and while the script is in the HF PR, you can do it separately with something like:

!git clone https://github.com/karpathy/nanochat.git
%cd nanochat
!pip install -e .

!wget https://huggingface.co/burtenshaw/nanochat-d20/resolve/main/tokenizer.pkl

from nanochat.tokenizer import RustBPETokenizer

tok = RustBPETokenizer.from_directory(".")

from transformers.integrations.tiktoken import convert_tiktoken_to_fast
from pathlib import Path

output_dir = Path("hf-tokenizer")
output_dir.mkdir(exist_ok=True)
convert_tiktoken_to_fast(tok.enc, output_dir)

Many thanks @Xenova ! It's perfectly working with my own tokenizer, now I can fully test the HF PR ❤️

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment