Includes Unsloth chat template fixes!
For llama.cpp, use --jinja

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

aquif-3.5-Plus & aquif-3.5-Max

The pinnacle of the aquif-3.5 series, released November 3rd, 2025. These models bring advanced reasoning capabilities and unprecedented context windows to achieve state-of-the-art performance for their respective categories.

aquif-3.5-Plus combines hybrid reasoning with interchangeable thinking modes, offering flexibility for both speed-optimized and reasoning-intensive applications.

aquif-3.5-Max represents frontier model capabilities with reasoning-only architecture, delivering exceptional performance across all benchmark categories.

Model Repository Links

Model	HuggingFace Repository
aquif-3.5-Plus	aquiffoo/aquif-3.5-Plus
aquif-3.5-Max	aquiffoo/aquif-3.5-Max

Model Overview

Model	Total (B)	Active Params (B)	Reasoning	Context Window	Thinking Modes
aquif-3.5-Plus	30.5	3.3	✅ Hybrid	1M	✅ Interchangeable
aquif-3.5-Max	42.4	3.3	✅ Reasoning-Only	1M	Reasoning-Only

Model Details

aquif-3.5-Plus (Hybrid Reasoning with Interchangeable Modes)

A breakthrough hybrid reasoning model offering unprecedented flexibility. Toggle between thinking and non-thinking modes to optimize for your specific use case—maintain reasoning capabilities when needed, or prioritize speed for time-sensitive applications.

Artificial Analysis Intelligence Index (AAII) Benchmarks

Core Performance Metrics

Benchmark	aquif-3.5-Plus (Non-Reasoning)	aquif-3.5-Plus (Reasoning)	aquif-3.5-Max
MMLU-Pro	80.2	82.8	85.4
GPQA Diamond	72.1	79.7	83.2
AIME 2025	64.7	90.3	94.6
LiveCodeBench	50.5	76.4	81.6
Humanity's Last Exam	4.3	12.1	15.6
TAU2-Telecom	34.2	41.5	51.3
IFBench	39.3	54.3	65.4
TerminalBench-Hard	10.1	15.2	23.9
AA-LCR	30.4	59.9	61.2
SciCode	29.5	35.7	40.9
AAII Composite Score	42 (41.53)	55 (54.79)	60 (60.31)

Comparable Models by Configuration

aquif-3.5-Plus (Non-Reasoning) — AAII 42

Model	AAII Score
GPT-5 mini	42
Claude Haiku 4.5	42
Gemini 2.5 Flash Lite 2509	42
aquif-3.5-Plus (Non-Reasoning)	42
DeepSeek V3 0324	41
Qwen3 VL 32B Instruct	41
Qwen3 Coder 480B A35B	42

aquif-3.5-Plus (Reasoning) — AAII 55

Model	AAII Score
GLM-4.6	56
Gemini 2.5 Flash 2509	54
Claude Haiku 4.5	55
aquif-3.5-Plus (Reasoning)	55
Qwen3 Next 80B A3B	54

aquif-3.5-Max — AAII 60

Model	AAII Score
Gemini 2.5 Pro	60
Grok 4 Fast	60
aquif-3.5-Max	60
MiniMax-M2	61
gpt-oss-120B high	61
GPT-5 mini	61
DeepSeek-V3.1-Terminus	58
Claude Opus 4.1	59

Key Features

Massive Context Windows: Both models support up to 1M tokens, enabling analysis of entire codebases, research papers, and extensive conversation histories without truncation.

Efficient Architecture: Despite offering frontier-level performance, both models maintain exceptional efficiency through optimized mixture-of-experts design and active parameter count of just 3.3B.

Flexible Reasoning (Plus Only): aquif-3.5-Plus provides interchangeable thinking modes—enable reasoning for complex problems, disable for faster inference on straightforward tasks.

Multilingual Support: Native support across English, German, Italian, Portuguese, French, Hindi, Spanish, Thai, Chinese, and Japanese.

Usage Recommendations

aquif-3.5-Plus:

Complex reasoning requiring flexibility between speed and depth
Scientific analysis and mathematical problem-solving with thinking enabled
Rapid-response applications with thinking disabled
Code generation and review
Multilingual applications up to 1M token contexts

aquif-3.5-Max:

Frontier-level problem-solving without compromise
Advanced research and scientific computing
Competition mathematics and algorithmic challenges
Comprehensive code analysis and generation
Complex multilingual tasks requiring maximum reasoning capability

Setting Thinking Mode (aquif-3.5-Plus)

Toggle between thinking and non-thinking modes by modifying the chat template:

set thinking = true    # Enable reasoning mode
set thinking = false   # Disable thinking mode (faster inference)

Simply set the variable in your chat template before inference to switch modes. No model reloading required.

Technical Specifications

Both models support:

BF16 and FP16 precision
Mixture of Experts architecture optimizations
Efficient attention mechanisms with optimized KV caching
Up to 1M token context window
Multi-head attention with sparse routing

Performance Highlights

aquif-3.5-Plus achieves 82.3% average benchmark performance in thinking mode, surpassing models with 2-4x more total parameters. Non-thinking mode maintains competitive 66.9% performance for latency-sensitive applications.

aquif-3.5-Max reaches 86.2% average performance, matching or exceeding frontier models while maintaining 42.4B total parameters—an extraordinary efficiency breakthrough.

Acknowledgements

Qwen Team: Base architecture contributions
Meta Llama Team: Core model foundations
Hugging Face: Model hosting and training infrastructure

License

This project is released under the Apache 2.0 License. See LICENSE file for details.

Made in 🇧🇷

Downloads last month: 1,308

GGUF

Model size

42B params

Architecture

qwen3moe

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for unsloth/aquif-3.5-Max-42B-A3B-GGUF

Base model

Qwen/Qwen3-30B-A3B-Instruct-2507

Finetuned

aquif-ai/aquif-3.5-Plus-30B-A3B

Finetuned

aquif-ai/aquif-3.5-Max-42B-A3B

Quantized

(15)

this model