nano-start_64_26m_f32
3 Mamba2 + 1 MLA + MoE (2 experts, top-1) model with 26.73M parameters
Trained with oxidizr, a Rust-based LLM training framework.
Overview
This model uses a hybrid architecture with:
- 3 Mamba2 layers - State Space Model (SSM) for efficient sequence modeling
- 1 MLA (Multi-Head Latent Attention) layers - Compressed KV cache attention
- MoE (Mixture of Experts) - 2 experts + shared expert, top-1 routing
Key Specifications:
- Parameters: 26.73M
- Context Length: 64 tokens
- Vocabulary: 100315 tokens (splintr tokenizer)
- Final Loss: 0.0738
- Training Steps: 241
Quick Start
# Install blazr (recommended inference server)
cargo install blazr
# Generate text
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Hello, world!"
# Start API server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080
Usage
Command Line
# Basic generation
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Your prompt here" --max-tokens 100
# With sampling parameters
blazr generate --model fs90/nano-start_64_26m_f32 \
--prompt "Once upon a time" \
--max-tokens 200 \
--temperature 0.8 \
--top-p 0.9
API Server
# Start the server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080
# The server provides OpenAI-compatible endpoints:
# - POST /v1/completions
# - POST /v1/chat/completions
# - GET /v1/models
Python Client
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")
# Chat completion
response = client.chat.completions.create(
model="fs90/nano-start_64_26m_f32",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100
)
print(response.choices[0].message.content)
Manual Download
# Using huggingface-cli
huggingface-cli download fs90/nano-start_64_26m_f32 --local-dir ./model
# Then run locally
blazr generate --model ./model --prompt "Hello!"
Important Notes
This model requires blazr for inference.
Standard inference tools (llama.cpp, vLLM, Transformers, etc.) do not support this architecture. The model uses:
- Custom architecture: Hybrid Mamba2/MLA/MoE layers trained with oxidizr
- Custom tokenizer: splintr BPE tokenizer with specialized tokens
Model Card
| Property | Value |
|---|---|
| Architecture | 3 Mamba2 + 1 MLA + MoE (2 experts, top-1) |
| Parameters | 26.73M |
| Hidden Size | 128 |
| Layers | 4 |
| Vocab Size | 100315 |
| Max Sequence Length | 64 |
| Precision | FP32 |
| License | MIT |
Links
- Downloads last month
- 16