A newer version of this model is available:
sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v2
π Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1
Multimodal Model for Arabic Text Extraction from Images and Historical Documents
π― Overview
Qwen2.5-VL-3B-OCR-Arabic-4bit-handwritten-v1 is a multimodal model (computer vision + natural language) built on Qwen2.5-VL-3B-Instruct, trained on specialized Arabic datasets for extracting Arabic texts from images and documents.
β¨ Features
- πΌοΈ Arabic Text Extraction: Ability to read Arabic texts from images and documents
- π¬ Bilingual Processing: Supports both Arabic and English
- π― Instruction Tuning: Trained on diverse commands for text extraction tasks
- π High Efficiency: 4-bit quantized model for memory optimization
π Technical Specifications
| Feature | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-VL-3B-Instruct |
| Parameters | 3 Billion |
| Quantization | 4-bit |
| Supported Languages | Arabic, English |
| Model Type | Multimodal (Image + Text) |
| License | Apache-2.0 |
π§ Installation & Usage
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
from typing import List, Dict
def process_vision_info(messages: List[dict]):
image_inputs = []
video_inputs = []
for message in messages:
if isinstance(message["content"], list):
for item in message["content"]:
if item["type"] == "image":
image = item["image"]
if isinstance(image, str):
image = Image.open(image).convert("RGB")
elif isinstance(image, Image.Image):
pass
else:
raise ValueError(f"Unsupported image type: {type(image)}")
image_inputs.append(image)
elif item["type"] == "video":
video_inputs.append(item["video"])
return image_inputs if image_inputs else None, video_inputs if video_inputs else None
# Load model and processor
model_name = "sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Setup processor
min_pixels = 512 * 28 * 28
max_pixels = 2048 * 28 * 28
processor = AutoProcessor.from_pretrained(
model_name,
min_pixels=min_pixels,
max_pixels=max_pixels
)
def extract_text_from_image(image_path):
try:
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "Read all texts in this image and extract them as they are. Don't miss any word."},
],
}
]
text = processor.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to(model.device)
generated_ids = model.generate(
**inputs,
max_new_tokens=1000,
do_sample=False,
pad_token_id=processor.tokenizer.pad_token_id
)
input_len = inputs.input_ids.shape[1]
output_text = processor.batch_decode(
generated_ids[:, input_len:],
skip_special_tokens=True,
clean_up_tokenization_spaces=False
)[0]
return output_text.strip()
except Exception as e:
return f"Error processing image: {e}"
# Usage example
if __name__ == "__main__":
image_path = "path/to/your/image.jpg" # Replace with your image path
extracted_text = extract_text_from_image(image_path)
print("Extracted Text:")
print(extracted_text)
ποΈ Training Data
Data Sources:
Muharaf Public Dataset https://huggingface.co/datasets/aamijar/muharaf-public
Arabic OCR Images https://huggingface.co/datasets/saleh-c4/arabic-ocr-images
KHATT Arabic Dataset https://gts.ai/dataset-download/khatt-arabic-dataset/
Additional historical manuscripts and documents
Training Statistics:
Item Value
Training Samples 60,880+
Epochs 3
Learning Rate 2e-5
Batch Size 40
π Performance
Test Results on Various Documents:
Task Accuracy Description
Text Extraction from Documents 77.63% Arabic texts from historical documents
Best Performance 96.88% On clear and simple texts
Worst Performance 56.00% On complex texts or difficult fonts
π We Launched the Beta Version!
What We Offer:
First open-source Arabic OCR model for historical documents
Average accuracy 77.63% - suitable for archiving and exploration
Completely free to use
4-bit quantized model for high efficiency
How You Can Help:
Use the model and give us feedback
Send us challenging examples you encounter
Help improve training data
Coming Soon:
Version 2.0 with target accuracy 85%+
Support for more Arabic fonts
User-friendly web interface
β οΈ Limitations & Warnings
π· Image Quality: Performance depends on input image quality and clarity
ποΈ Handwriting: May struggle with irregular handwriting
π Content: Must be used for legal and ethical purposes only
π Dialects: Primarily trained on Standard Arabic
π‘οΈ Ethical Responsibility
This model should be used responsibly considering:
Respect for privacy and copyright
Avoidance of fraudulent purposes
Compliance with local and international laws
Verification of results in sensitive applications
π Citation
If you use this model in your research, please cite as follows:
@misc{qwen25vlarabicocr,
title={Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1: Arabic Text Extraction Model},
author={Sherif1313},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/sherif1313/Arabic-handwritten-OCR-4bit-Qwen2.5-VL-3B-v1}}
}
- Downloads last month
- 384