π€ Dirty-Calla-4B β MLX builds for Apple Silicon
Dirty-Calla-4B-mlx provides Apple Siliconβoptimized versions of Daizee/Dirty-Calla-4B, a fine-tuned Gemma 3 (4B) model developed by Daizee for expressive, humanlike, and emotionally textured responses.
This conversion uses Appleβs MLX framework for local inference on M1, M2, and M3 Macs.
Each variant trades size for speed or precision, so you can choose what fits your workflow.
π§© Note on vocab padding:
The tokenizer and embedding matrix were padded to the next multiple of 64 tokens (262,208 total).
Added tokens are labeled<pad_ex_*>β they will not appear in normal generations.
βοΈ Variants
| Folder | Bits | Group Size | Description |
|---|---|---|---|
mlx/g128/ |
int4 | 128 | Smallest & fastest (lightest memory use) |
mlx/g64/ |
int4 | 64 | Balanced: slightly slower, more stable |
mlx/int8/ |
int8 | β | Closest to fp16 precision, best coherence |
π Quickstart
Run directly from Hugging Face
python -m mlx_lm.generate \
--model hf://Daizee/Dirty-Calla-4B-mlx/mlx/g64 \
--prompt "Describe a rainy city from the perspective of a poet." \
--max-tokens 150 --temp 0.4
- Downloads last month
- 38