Qwen2.5-VL-7B-Cognition-Full-SFT

This repo contains the fine-tuned Qwen-2.5-VL 7B Instruct weights (SFT) trained on the CogIP-Bench dataset which is the model demonstrating the effectiveness of cognition alignment by Qwen-Image. More details see GitHub repo.

Figure: Qualitative comparison of images generated by the Qwen-Image pipeline using different LLM backbones (with the same prompt). The figure shows the effect of pretraining versus supervised fine-tuning (SFT) on image cognition properties. For each image pair, Left: Base model; right: SFT model. Generation prompts are shown under each image pair. We can see that images generated with our SFT MLLM backbone better demonstrate the cognitive cues embedded in the prompts.

Quick load example

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoFeatureExtractor

MODEL_ID = "foolen/qwen2.5-vl-7b-cognition-full-sft"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
feature_extractor = AutoFeatureExtractor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto")  # or load_in_4bit=True

Downloads last month: 1

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for foolen/qwen2.5-vl-7b-cognition-full-sft

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(944)

this model

foolen
/

qwen2.5-vl-7b-cognition-full-sft

Qwen2.5-VL-7B-Cognition-Full-SFT

Quick load example

Model tree for foolen/qwen2.5-vl-7b-cognition-full-sft

Dataset used to train foolen/qwen2.5-vl-7b-cognition-full-sft