nano-start_64_26m_f32

3 Mamba2 + 1 MLA + MoE (2 experts, top-1) model with 26.73M parameters

Trained with oxidizr, a Rust-based LLM training framework.

Overview

This model uses a hybrid architecture with:

3 Mamba2 layers - State Space Model (SSM) for efficient sequence modeling
1 MLA (Multi-Head Latent Attention) layers - Compressed KV cache attention
MoE (Mixture of Experts) - 2 experts + shared expert, top-1 routing

Key Specifications:

Parameters: 26.73M
Context Length: 64 tokens
Vocabulary: 100315 tokens (splintr tokenizer)
Final Loss: 0.0738
Training Steps: 241

Quick Start

# Install blazr (recommended inference server)
cargo install blazr

# Generate text
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Hello, world!"

# Start API server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080

Usage

Command Line

# Basic generation
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Your prompt here" --max-tokens 100

# With sampling parameters
blazr generate --model fs90/nano-start_64_26m_f32 \
  --prompt "Once upon a time" \
  --max-tokens 200 \
  --temperature 0.8 \
  --top-p 0.9

API Server

# Start the server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080

# The server provides OpenAI-compatible endpoints:
# - POST /v1/completions
# - POST /v1/chat/completions
# - GET  /v1/models

Python Client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

# Chat completion
response = client.chat.completions.create(
    model="fs90/nano-start_64_26m_f32",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)
print(response.choices[0].message.content)

Manual Download

# Using huggingface-cli
huggingface-cli download fs90/nano-start_64_26m_f32 --local-dir ./model

# Then run locally
blazr generate --model ./model --prompt "Hello!"

Important Notes

This model requires blazr for inference.

Standard inference tools (llama.cpp, vLLM, Transformers, etc.) do not support this architecture. The model uses:

Custom architecture: Hybrid Mamba2/MLA/MoE layers trained with oxidizr
Custom tokenizer: splintr BPE tokenizer with specialized tokens

Model Card

Property	Value
Architecture	3 Mamba2 + 1 MLA + MoE (2 experts, top-1)
Parameters	26.73M
Hidden Size	128
Layers	4
Vocab Size	100315
Max Sequence Length	64
Precision	FP32
License	MIT

fs90
/

nano-start_64_26m_f32