Qwen2.5-VL-7B-Cognition-Full-SFT

This repo contains the fine-tuned Qwen-2.5-VL 7B Instruct weights (SFT) trained on the CogIP-Bench dataset which is the model demonstrating the effectiveness of cognition alignment by Qwen-Image. More details see GitHub repo.

16b5a48e2028643c278bf962015db2fc

Figure: Qualitative comparison of images generated by the Qwen-Image pipeline using different LLM backbones (with the same prompt). The figure shows the effect of pretraining versus supervised fine-tuning (SFT) on image cognition properties. For each image pair, Left: Base model; right: SFT model. Generation prompts are shown under each image pair. We can see that images generated with our SFT MLLM backbone better demonstrate the cognitive cues embedded in the prompts.

Quick load example

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoFeatureExtractor

MODEL_ID = "foolen/qwen2.5-vl-7b-cognition-full-sft"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
feature_extractor = AutoFeatureExtractor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto")  # or load_in_4bit=True
Downloads last month
1
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for foolen/qwen2.5-vl-7b-cognition-full-sft

Finetuned
(944)
this model

Dataset used to train foolen/qwen2.5-vl-7b-cognition-full-sft