nano-start_64_26m_f32

3 Mamba2 + 1 MLA + MoE (2 experts, top-1) model with 26.73M parameters

Trained with oxidizr, a Rust-based LLM training framework.

Overview

This model uses a hybrid architecture with:

  • 3 Mamba2 layers - State Space Model (SSM) for efficient sequence modeling
  • 1 MLA (Multi-Head Latent Attention) layers - Compressed KV cache attention
  • MoE (Mixture of Experts) - 2 experts + shared expert, top-1 routing

Key Specifications:

  • Parameters: 26.73M
  • Context Length: 64 tokens
  • Vocabulary: 100315 tokens (splintr tokenizer)
  • Final Loss: 0.0738
  • Training Steps: 241

Quick Start

# Install blazr (recommended inference server)
cargo install blazr

# Generate text
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Hello, world!"

# Start API server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080

Usage

Command Line

# Basic generation
blazr generate --model fs90/nano-start_64_26m_f32 --prompt "Your prompt here" --max-tokens 100

# With sampling parameters
blazr generate --model fs90/nano-start_64_26m_f32 \
  --prompt "Once upon a time" \
  --max-tokens 200 \
  --temperature 0.8 \
  --top-p 0.9

API Server

# Start the server
blazr serve --model fs90/nano-start_64_26m_f32 --port 8080

# The server provides OpenAI-compatible endpoints:
# - POST /v1/completions
# - POST /v1/chat/completions
# - GET  /v1/models

Python Client

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="unused")

# Chat completion
response = client.chat.completions.create(
    model="fs90/nano-start_64_26m_f32",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=100
)
print(response.choices[0].message.content)

Manual Download

# Using huggingface-cli
huggingface-cli download fs90/nano-start_64_26m_f32 --local-dir ./model

# Then run locally
blazr generate --model ./model --prompt "Hello!"

Important Notes

This model requires blazr for inference.

Standard inference tools (llama.cpp, vLLM, Transformers, etc.) do not support this architecture. The model uses:

  • Custom architecture: Hybrid Mamba2/MLA/MoE layers trained with oxidizr
  • Custom tokenizer: splintr BPE tokenizer with specialized tokens

Model Card

Property Value
Architecture 3 Mamba2 + 1 MLA + MoE (2 experts, top-1)
Parameters 26.73M
Hidden Size 128
Layers 4
Vocab Size 100315
Max Sequence Length 64
Precision FP32
License MIT

Links

Downloads last month
16
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support