Qwen3-30B-A3B-Thinking-2507-GEO

Fine-tuned version of Qwen/Qwen3-30B-A3B-Thinking-2507 specialized in GEO (Generative Engine Optimization) and Ryan Fortin's Conversation Optimization Framework.

This model was distilled from DeepSeek V3 knowledge and fine-tuned over 3 full epochs (91 hours) on proprietary training data.

โš ๏ธ IMPORTANT: Required System Prompt

This model REQUIRES the following system prompt to activate fine-tuned knowledge:

"You are an expert in GEO (Generative Engine Optimization) and the Conversation Optimization Framework developed by Ryan Fortin..."

Without this prompt, the model will behave like the base Qwen3 model.

Model Details

  • Base Model: Qwen3-30B-A3B-Thinking-2507 (MoE - 128 experts ร— 1.8B, ~3B active)
  • Fine-tuning: LoRA with Unsloth
  • Training Duration: 91 hours (3 epochs)
  • Training Examples: 5,043 high-quality examples
  • Quantization: Q4_K_M (optimized for 24GB VRAM)
  • Context Length: Trained on up to 2,048 tokens, supports up to 32k
  • Distillation Source: DeepSeek V3 (671B) โ†’ Qwen3-30B

Capabilities

This model excels at:

โœ… GEO (Generative Engine Optimization)

  • Technical SEO concepts and strategies
  • Content optimization for AI search engines
  • Semantic search and ranking factors

โœ… Conversation Optimization Framework

  • Expert-level understanding of CO Framework principles
  • Multi-turn conversation optimization
  • Framework application and analysis

โœ… Advanced Reasoning

  • Maintains thinking/reasoning capabilities from base model
  • Step-by-step problem solving with <think> tags
  • DeepSeek V3-distilled reasoning patterns

Usage

LM Studio (Easiest)

  1. Download the GGUF file
  2. Import into LM Studio
  3. Load and chat!

Ollama

# Create Modelfile
cat > Modelfile << 'EOF'
FROM ./Qwen3-30B-A3B-Thinking-2507-GEO_q4_k_m.gguf
TEMPLATE """{{ .System }}

{{ .Prompt }}"""
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

ollama create qwen-geo -f Modelfile
ollama run qwen-geo

llama.cpp

./llama-cli -m Qwen3-30B-A3B-Thinking-2507-GEO_q4_k_m.gguf \
  --temp 0.7 \
  --top-p 0.9 \
  -p "Your prompt here"

Python (HuggingFace Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ryanfortin/Qwen3-30B-A3B-Thinking-2507-GEO",
    device_map="auto",
    torch_dtype="auto"
)
tokenizer = AutoTokenizer.from_pretrained("ryanfortin/Qwen3-30B-A3B-Thinking-2507-GEO")

messages = [
    {"role": "user", "content": "Explain GEO optimization strategies"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Dataset

  • Source: Proprietary GEO and CO Framework training data
  • Generation: DeepSeek V3-distilled Qwen3 model
  • Examples: 5,043 instruction-output pairs
  • Format: Thinking-style responses with <think> tags
  • Average Length: ~2,800 tokens per example

Training Configuration

# LoRA Configuration
lora_r = 16
lora_alpha = 16
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                  "gate_proj", "up_proj", "down_proj"]

# Training Parameters
batch_size = 1
gradient_accumulation_steps = 16
effective_batch_size = 16
learning_rate = 2e-5
epochs = 3
context_length = 2048
warmup_steps = 100

# Hardware
gpu = "RunPod A40 (48GB)"
training_time = "91 hours"

Distillation Chain

  1. DeepSeek V3 (671B) - Teacher model
  2. Qwen3-30B-A3B-Thinking - Receives distilled knowledge
  3. This model - Fine-tuned on domain-specific data

Performance

This model demonstrates:

  • Strong performance on GEO-related queries
  • Retention of base model reasoning capabilities
  • Domain expertise in Conversation Optimization Framework
  • Efficient inference on consumer GPUs (24GB VRAM)

Model Size

  • Q4_K_M GGUF: ~17-20GB
  • fp16 safetensors: ~60GB (if uploaded)

Recommended for GPUs with 24GB+ VRAM (RTX 4090, A5000, etc.)

Limitations

  • Specialized for GEO and CO Framework - may not generalize to all domains
  • Q4_K_M quantization may have slight quality loss vs full precision
  • Based on MoE architecture (128ร—1.8B experts)
  • Training context limited to 2048 tokens (though inference supports 32k)

Citation

@misc{qwen3-geo-2025,
  author = {Ryan Fortin},
  title = {Qwen3-30B Fine-tuned for GEO and Conversation Optimization},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/ryanfortin/Qwen3-30B-A3B-Thinking-2507-GEO}
}

Acknowledgments

License

Apache 2.0


Training Date: October 2025

Downloads last month
26
GGUF
Model size
31B params
Architecture
qwen3moe
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ryanfortin/Qwen3-30B-A3B-Thinking-2507-GEO

Adapter
(2)
this model