nightmedia
/

Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-mxfp4-mlx

Text Generation

code generation

Mixture of Experts

Qwen3-Coder-30B-A3B-Instruct

mixture of experts

8 active experts

1 million context

optional thinking

4-bit precision

Model card Files Files and versions

Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-mxfp4-mlx / README.md

nightmedia's picture

Update README.md

7d8e9fa verified about 1 month ago

|

history blame contribute delete

2.31 kB

	---
	license: apache-2.0
	library_name: mlx
	datasets:
	- DavidAU/ST-TheNextGeneration
	language:
	- en
	- fr
	- zh
	- de
	tags:
	- programming
	- code generation
	- code
	- codeqwen
	- moe
	- coding
	- coder
	- qwen2
	- chat
	- qwen
	- qwen-coder
	- Qwen3-Coder-30B-A3B-Instruct
	- Qwen3-30B-A3B
	- mixture of experts
	- 128 experts
	- 8 active experts
	- 1 million context
	- qwen3
	- finetune
	- brainstorm 20x
	- brainstorm
	- optional thinking
	- qwen3_moe
	- unsloth
	- mlx
	base_model: DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III
	pipeline_tag: text-generation
	---

	# Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-mxfp4-mlx

	This quant is scheduled to be deleted due to the limits on accounts imposed by HuggingFace.

	There is nothing I can do but delete old models to make room.

	Please archive this model locally as I will not be able to upload a new one.

	You can still use the slightly larger [Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III qx86-hi-mlx](https://huggingface.co/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-qx86-hi-mlx) instead, it performs much better, besides I am the only one being able to create the qx quants.

	If you want to create your own mxfp4 model from a source, you can use the mlx tools as follows:

	```bash
	mlx_lm.convert --hf-path My/Model --mlx-path My-Model-mxfp4-mlx -q --q-bits 4 --q-group-size 32 --q-mode mxfp4
	```
	And then you wait. You need a lot of RAM and patience.

	Sorry for the inconvenience.

	-G

	This model [Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-mxfp4-mlx]a(https://huggingface.co/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-mxfp4-mlx) was
	converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III)
	using mlx-lm version 0.28.2.

	## Use with mlx

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-mxfp4-mlx")

	prompt = "hello"

	if tokenizer.chat_template is not None:
	messages = [{"role": "user", "content": prompt}]
	prompt = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True
	)

	response = generate(model, tokenizer, prompt=prompt, verbose=True)
	```