πŸ–€ Dirty-Calla-4B β€” MLX builds for Apple Silicon

Dirty-Calla-4B-mlx provides Apple Silicon–optimized versions of Daizee/Dirty-Calla-4B, a fine-tuned Gemma 3 (4B) model developed by Daizee for expressive, humanlike, and emotionally textured responses.

This conversion uses Apple’s MLX framework for local inference on M1, M2, and M3 Macs.
Each variant trades size for speed or precision, so you can choose what fits your workflow.

🧩 Note on vocab padding:
The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total).
Added tokens are labeled <pad_ex_*> β€” they will not appear in normal generations.


βš™οΈ Variants

Folder Bits Group Size Description
mlx/g128/ int4 128 Smallest & fastest (lightest memory use)
mlx/g64/ int4 64 Balanced: slightly slower, more stable
mlx/int8/ int8 β€” Closest to fp16 precision, best coherence

πŸš€ Quickstart

Run directly from Hugging Face

python -m mlx_lm.generate \
  --model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \
  --prompt "Describe a rainy city from the perspective of a poet." \
  --max-tokens 150 --temp 0.4
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Daizee/Dirty-Calla-4B-mlx

Finetuned
(1)
this model