A newer version of this model is available: sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v2

🕌 Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1

Multimodal Model for Arabic Text Extraction from Images and Historical Documents

🎯 Overview

Qwen2.5-VL-3B-OCR-Arabic-4bit-handwritten-v1 is a multimodal model (computer vision + natural language) built on Qwen2.5-VL-3B-Instruct, trained on specialized Arabic datasets for extracting Arabic texts from images and documents.

✨ Features

🖼️ Arabic Text Extraction: Ability to read Arabic texts from images and documents
💬 Bilingual Processing: Supports both Arabic and English
🎯 Instruction Tuning: Trained on diverse commands for text extraction tasks
🚀 High Efficiency: 4-bit quantized model for memory optimization

📊 Technical Specifications

Feature	Value
Base Model	Qwen/Qwen2.5-VL-3B-Instruct
Parameters	3 Billion
Quantization	4-bit
Supported Languages	Arabic, English
Model Type	Multimodal (Image + Text)
License	Apache-2.0

🔧 Installation & Usage

from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
from typing import List, Dict

def process_vision_info(messages: List[dict]):
    image_inputs = []
    video_inputs = []
    for message in messages:
        if isinstance(message["content"], list):
            for item in message["content"]:
                if item["type"] == "image":
                    image = item["image"]
                    if isinstance(image, str):
                        image = Image.open(image).convert("RGB")
                    elif isinstance(image, Image.Image):
                        pass
                    else:
                        raise ValueError(f"Unsupported image type: {type(image)}")
                    image_inputs.append(image)
                elif item["type"] == "video":
                    video_inputs.append(item["video"])
    return image_inputs if image_inputs else None, video_inputs if video_inputs else None

# Load model and processor
model_name = "sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Setup processor
min_pixels = 512 * 28 * 28 
max_pixels = 2048 * 28 * 28
processor = AutoProcessor.from_pretrained(
    model_name, 
    min_pixels=min_pixels, 
    max_pixels=max_pixels
)

def extract_text_from_image(image_path):
    try:
        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": image_path},
                    {"type": "text", "text": "Read all texts in this image and extract them as they are. Don't miss any word."},
                ],
            }
        ]

        text = processor.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True
        )
        image_inputs, video_inputs = process_vision_info(messages)
        
        inputs = processor(
            text=[text],
            images=image_inputs,
            videos=video_inputs,
            padding=True,
            return_tensors="pt",
        ).to(model.device)

        generated_ids = model.generate(
            **inputs,
            max_new_tokens=1000,
            do_sample=False,
            pad_token_id=processor.tokenizer.pad_token_id
        )

        input_len = inputs.input_ids.shape[1]
        output_text = processor.batch_decode(
            generated_ids[:, input_len:],
            skip_special_tokens=True,
            clean_up_tokenization_spaces=False
        )[0]

        return output_text.strip()

    except Exception as e:
        return f"Error processing image: {e}"

# Usage example
if __name__ == "__main__":
    image_path = "path/to/your/image.jpg"  # Replace with your image path
    extracted_text = extract_text_from_image(image_path)
    print("Extracted Text:")
    print(extracted_text)







🏋️ Training Data
Data Sources:

    Muharaf Public Dataset     https://huggingface.co/datasets/aamijar/muharaf-public

    Arabic OCR Images          https://huggingface.co/datasets/saleh-c4/arabic-ocr-images

    KHATT Arabic Dataset       https://gts.ai/dataset-download/khatt-arabic-dataset/



    Additional historical manuscripts and documents

Training Statistics:
Item	Value
Training Samples	60,880+
Epochs	3
Learning Rate	2e-5
Batch Size	40
📊 Performance
Test Results on Various Documents:
Task	Accuracy	Description
Text Extraction from Documents	77.63%	Arabic texts from historical documents
Best Performance	96.88%	On clear and simple texts
Worst Performance	56.00%	On complex texts or difficult fonts
🚀 We Launched the Beta Version!
What We Offer:

    First open-source Arabic OCR model for historical documents

    Average accuracy 77.63% - suitable for archiving and exploration

    Completely free to use

    4-bit quantized model for high efficiency

How You Can Help:

    Use the model and give us feedback

    Send us challenging examples you encounter

    Help improve training data

Coming Soon:

    Version 2.0 with target accuracy 85%+

    Support for more Arabic fonts

    User-friendly web interface

⚠️ Limitations & Warnings

    📷 Image Quality: Performance depends on input image quality and clarity

    🖋️ Handwriting: May struggle with irregular handwriting

    🔞 Content: Must be used for legal and ethical purposes only

    🌐 Dialects: Primarily trained on Standard Arabic

🛡️ Ethical Responsibility

This model should be used responsibly considering:

    Respect for privacy and copyright

    Avoidance of fraudulent purposes

    Compliance with local and international laws

    Verification of results in sensitive applications

📄 Citation

If you use this model in your research, please cite as follows:



@misc{qwen25vlarabicocr,
  title={Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1: Arabic Text Extraction Model},
  author={Sherif1313},
  year={2025},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1}}
}

Downloads last month: 384

Safetensors

Model size

4B params

Tensor type

F32

F16

sherif1313
/

Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1

🕌 Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1

🎯 Overview

✨ Features

📊 Technical Specifications

🔧 Installation & Usage

Datasets used to train sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1