Qwen3-VL-30B-A3B-Thinking-abliteration-v1.1-Hybrid

Qwen3-VL-30B-A3B-Thinking-abliteration-v1.1-Hybrid is the hybrid abliterated (v1.1) variant of Qwen3-VL-30B-A3B-Thinking, developed for science, technology, and modality-balanced reasoning. This version introduces selective Abliterated Reasoning with controlled interpretive depth — optimized to handle scientific, technical, and analytically complex visual and textual contexts while maintaining adaptive restraint toward photo-sensitive or sensual imagery (as such content is not its intended goal)..

Key Highlights

Hybrid Abliterated Reasoning Balances descriptive openness with contextual restraint — delivering precise, factual, and context-rich reasoning while regulating sensitive visual interpretations.
Domain-Optimized for Science & Technology Fine-tuned on high-density scientific and technical corpora for enhanced comprehension of visual data, formulas, charts, and scientific diagrams.
Photo-Sensitivity Awareness Calibrated to downscale explicit or sensual content interpretation while maintaining analytical clarity and aesthetic neutrality in reasoning outputs.
Mixture of Experts (MoE) Hybrid Efficiency Built upon the Qwen3-VL-MoE backbone, dynamically distributing computation between reasoning, visual understanding, and linguistic synthesis experts.
Analytical Reasoning Depth Capable of producing multi-layered reasoning chains that bridge visual cues, data interpretation, and theoretical logic.
Multilingual Science Mode Retains full multilingual support with optimized token control for scientific notation, technical abbreviations, and multilingual data visualization tasks.

Quick Start with Transformers

from transformers import Qwen3VLMoeForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch

model = Qwen3VLMoeForConditionalGeneration.from_pretrained(
    "prithivMLmods/Qwen3-VL-30B-A3B-Thinking-abliteration-v1.1-Hybrid",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained("prithivMLmods/Qwen3-VL-30B-A3B-Thinking-abliteration-v1.1-Hybrid")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"},
            {"type": "text", "text": "Generate a detailed technical caption and reasoning for this image."},
        ],
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=160)

generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]

output_text = processor.batch_decode(
    generated_ids_trimmed,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(output_text)

Intended Use

Analytical captioning and reasoning for scientific, industrial, and technological imagery.
Hybrid multimodal reasoning research — combining logical depth with content safety filters.
Descriptive captioning for complex technical diagrams, experiments, and visual datasets.
Studies in multimodal moderation, content alignment, and interpretive balance.
Creative scientific visualization, educational explanations, and knowledge-grounded storytelling.

Limitations

Despite hybrid moderation, some outputs may still vary in tone or detail depending on input sensitivity.
Performance may reduce for extremely abstract, non-visualized mathematical inputs.
Not suited for high-risk content pipelines requiring strict zero-tolerance filtering.
Caption and reasoning verbosity may fluctuate based on temperature and prompt control parameters.