nanochat-1.8B-pretrain
Base pretrained model on FineWeb-EDU dataset. This model has learned basic language patterns but hasn't been trained for conversation.
Model Details
- Model Type: GPT-style transformer trained from scratch
- Parameters: ~1.9 billion
- Training Phase: pretrain
- Architecture: 20 layers, 1280 embedding dimension
- Hardware: NVIDIA DGX Spark (Grace Blackwell GB10)
- Framework: NanoChat
- Training Precision: BFloat16
Training Details
- GPU: NVIDIA Grace Blackwell GB10
- Memory: 128GB unified memory
- CUDA: 13.0
- Optimization: Muon optimizer for matrix parameters, AdamW for others
- Checkpoint Step: 021400
Usage
Prerequisites
# Clone the NanoChat repository
git clone https://github.com/karpathy/nanochat.git
cd nanochat
# Install dependencies (requires CUDA)
uv venv
uv sync --extra gpu
# Activate the virtual environment
source .venv/bin/activate
Option: DGX Spark Setup
# Prepare environment and clone NanoChat
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/prepare.sh
chmod +x prepare.sh
./prepare.sh --setup-only
Quick Test
Download and test this model from HuggingFace:
# Clone the test script
wget https://raw.githubusercontent.com/jasonacox/dgx-spark/main/nanochat/hf_test.py
# Set python environment
source nanochat/.venv/bin/activate
# Install dependencies
pip install huggingface_hub
# Run with this model
python hf_test.py --model jasonacox/nanochat-1.8B-pretrain
Example Code
import sys
import os
import glob
from huggingface_hub import snapshot_download
import torch
from contextlib import nullcontext
# Download model from HuggingFace
print("Downloading model...")
model_path = snapshot_download(
repo_id="jasonacox/nanochat-1.8B-pretrain",
cache_dir=os.path.expanduser("~/.cache/nanochat/hf_downloads")
)
# Setup NanoChat (clone if needed)
nanochat_path = "nanochat"
if not os.path.exists(nanochat_path):
os.system("git clone https://github.com/karpathy/nanochat.git")
os.system("cd nanochat && uv sync --extra gpu")
sys.path.insert(0, nanochat_path)
from nanochat.checkpoint_manager import build_model
from nanochat.common import compute_init, autodetect_device_type
from nanochat.engine import Engine
# Initialize
device_type = autodetect_device_type()
_, _, _, _, device = compute_init(device_type)
ptdtype = torch.bfloat16
autocast_ctx = torch.amp.autocast(device_type=device_type, dtype=ptdtype) if device_type == "cuda" else nullcontext()
# Load model
checkpoint_files = glob.glob(os.path.join(model_path, "model_*.pt"))
step = int(os.path.basename(checkpoint_files[0]).split("_")[-1].split(".")[0])
model, tokenizer, _ = build_model(model_path, step, device, phase="eval")
engine = Engine(model, tokenizer)
# Generate
prompt = "Hello, how are you?"
tokens = tokenizer.encode(prompt)
print(f"Prompt: {prompt}\nResponse: ", end="", flush=True)
with autocast_ctx:
for token_column, _ in engine.generate(tokens, num_samples=1, max_tokens=100, temperature=0.8, top_k=50):
print(tokenizer.decode([token_column[0]]), end="", flush=True)
print()
Training Pipeline
This model was trained using the DGX Spark optimized training pipeline:
- Pretraining: Base language model on FineWeb-EDU dataset
- Midtraining: Fine-tuned on conversational data (SmolTalk)
- SFT: Supervised fine-tuning on curated conversations
- RL: Reinforcement learning with GRPO
Limitations
- This is a micro-model (1.9B parameters) - smaller than commercial LLMs
- May make factual errors or hallucinate
- Limited knowledge cutoff from training data
- Best suited for educational purposes and experimentation
Citation
@misc{nanochat-1.8B,
author = {jasonacox},
title = {nanochat-1.8B-pretrain},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/jasonacox/nanochat-1.8B-pretrain}}
}
Acknowledgments
- Andrej Karpathy for NanoChat
- NVIDIA DGX Spark platform
- FineWeb-EDU and SmolTalk datasets
License
MIT License - Free to use for research and educational purposes
- Downloads last month
- 26