NanoChat SFT

This is the RL trained checkpoint from Andrej Karpathy's fullstack llm project to build an LLM, nanochat.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer


model_name = "nanochat-students/rl-d20"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True).to(device)
model.eval()

conversation = [
    {"role": "user", "content": "Hello, who are you?"},
]
rendered = tokenizer.apply_chat_template(
    conversation,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([rendered], return_tensors="pt").to(model.device)

generated = model.generate(**model_inputs, max_new_tokens=256)
output_ids = generated[0, model_inputs.input_ids.shape[1]:]
print(tokenizer.decode(output_ids, skip_special_tokens=True))

Chat RL Training Metrics

timestamp: 2025-10-15 12:59:52

  • run: burtenshaw-20251015111354
  • source: sft
  • dtype: bfloat16
  • device_batch_size: 8
  • examples_per_step: 16
  • num_samples: 16
  • max_new_tokens: 256
  • temperature: 1.0000
  • top_k: 50
  • unembedding_lr: 0.0040
  • embedding_lr: 0.2000
  • matrix_lr: 0.0200
  • weight_decay: 0.0000
  • init_lr_frac: 0.0500
  • num_epochs: 1
  • save_every: 60
  • eval_every: 60
  • eval_examples: 400

Chat evaluation RL

timestamp: 2025-10-15 13:04:39

  • source: rl
  • task_name: GSM8K
  • dtype: bfloat16
  • temperature: 0.0000
  • max_new_tokens: 512
  • num_samples: 1
  • top_k: 50
  • batch_size: 8
  • model_tag: None
  • step: None
  • max_problems: None
  • GSM8K: 0.0970

Logs from training can be found here: https://huggingface.co/spaces/nanochat-students/trackio

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train nanochat-students/rl-d20

Evaluation results