nanochat-1.8B-pretrain

Base pretrained model on FineWeb-EDU dataset. This model has learned basic language patterns but hasn't been trained for conversation.

Model Details

  • Model Type: GPT-style transformer trained from scratch
  • Parameters: ~1.9 billion
  • Training Phase: pretrain
  • Architecture: 20 layers, 1280 embedding dimension
  • Hardware: NVIDIA DGX Spark (Grace Blackwell GB10)
  • Framework: NanoChat
  • Training Precision: BFloat16

Training Details

  • GPU: NVIDIA Grace Blackwell GB10
  • Memory: 128GB unified memory
  • CUDA: 13.0
  • Optimization: Muon optimizer for matrix parameters, AdamW for others
  • Checkpoint Step: 021400

Usage

Prerequisites

# Clone the NanoChat repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat

# Install dependencies (requires CUDA)
uv venv
uv sync --extra gpu

# Activate the virtual environment
source .venv/bin/activate

Option: DGX Spark Setup

# Prepare environment and clone NanoChat
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/prepare.sh
chmod +x prepare.sh
./prepare.sh --setup-only

Quick Test

Download and test this model from HuggingFace:

# Clone the test script
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/hf_test.py

# Set python environment
source nanochat/.venv/bin/activate

# Install dependencies
pip install huggingface_hub

# Run with this model
python hf_test.py --model jasonacox/nanochat-1.8B-pretrain

Example Code

import sys
import os
import glob
from huggingface_hub import snapshot_download
import torch
from contextlib import nullcontext

# Download model from HuggingFace
print("Downloading model...")
model_path = snapshot_download(
    repo_id="jasonacox/nanochat-1.8B-pretrain",
    cache_dir=os.path.expanduser("~/.cache/nanochat/hf_downloads")
)

# Setup NanoChat (clone if needed)
nanochat_path = "nanochat"
if not os.path.exists(nanochat_path):
    os.system("git clone https://github.com/karpathy/nanochat.git")
    os.system("cd nanochat && uv sync --extra gpu")

sys.path.insert(0, nanochat_path)

from nanochat.checkpoint_manager import build_model
from nanochat.common import compute_init, autodetect_device_type
from nanochat.engine import Engine

# Initialize
device_type = autodetect_device_type()
_, _, _, _, device = compute_init(device_type)
ptdtype = torch.bfloat16
autocast_ctx = torch.amp.autocast(device_type=device_type, dtype=ptdtype) if device_type == "cuda" else nullcontext()

# Load model
checkpoint_files = glob.glob(os.path.join(model_path, "model_*.pt"))
step = int(os.path.basename(checkpoint_files[0]).split("_")[-1].split(".")[0])
model, tokenizer, _ = build_model(model_path, step, device, phase="eval")
engine = Engine(model, tokenizer)

# Generate
prompt = "Hello, how are you?"
tokens = tokenizer.encode(prompt)
print(f"Prompt: {prompt}\nResponse: ", end="", flush=True)

with autocast_ctx:
    for token_column, _ in engine.generate(tokens, num_samples=1, max_tokens=100, temperature=0.8, top_k=50):
        print(tokenizer.decode([token_column[0]]), end="", flush=True)
print()

Training Pipeline

This model was trained using the DGX Spark optimized training pipeline:

  1. Pretraining: Base language model on FineWeb-EDU dataset
  2. Midtraining: Fine-tuned on conversational data (SmolTalk)
  3. SFT: Supervised fine-tuning on curated conversations
  4. RL: Reinforcement learning with GRPO

Limitations

  • This is a micro-model (1.9B parameters) - smaller than commercial LLMs
  • May make factual errors or hallucinate
  • Limited knowledge cutoff from training data
  • Best suited for educational purposes and experimentation

Citation

@misc{nanochat-1.8B,
  author = {jasonacox},
  title = {nanochat-1.8B-pretrain},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/jasonacox/nanochat-1.8B-pretrain}}
}

Acknowledgments

  • Andrej Karpathy for NanoChat
  • NVIDIA DGX Spark platform
  • FineWeb-EDU and SmolTalk datasets

License

MIT License - Free to use for research and educational purposes

Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support