DeepSeek OCR

Note currently only NexaSDK supports this model's GGUF.

Quickstart

  1. Install NexaSDK

  2. Run the model locally with one line of code:

    nexa infer NexaAI/DeepSeek-OCR-GGUF
    
  3. Then drag your image to terminal or type into the image path

case 1 : extract text

<your-image-path> Free OCR.

case 2 : extract bounding box

<your-image-path> <|grounding|>Convert the document to markdown. 

Note: If the model fails to run, install the latest Vulkan driver for Windows

Model Description

DeepSeek OCR is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes.
It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.

Features

  • Multilingual OCR β€” recognizes printed and handwritten text across major global languages.
  • Document Layout Understanding β€” preserves structure such as tables, paragraphs, and titles.
  • Scene Text Recognition β€” robust against lighting, distortion, and low-quality captures.
  • Lightweight & Fast β€” optimized for CPU and GPU acceleration.
  • End-to-End Pipeline β€” supports image-to-text and structured JSON output.

Use Cases

  • Digitizing scanned documents or PDFs
  • Extracting text from mobile camera inputs or screenshots
  • Invoice and receipt parsing
  • OCR-based search and indexing systems
  • Visual question answering or document agents

Inputs and Outputs

Input:

  • Image file (JPEG, PNG, or tensor array)
  • Optional parameters for language hints or layout detection

Output:

  • Extracted text (plain text or structured format with bounding boxes)
  • Confidence scores per word or region

Integration

DeepSeek OCR can be integrated through:

  • Python API (pip install deepseek-ocr)
  • REST or gRPC endpoints for server deployment

License

This model is released under the Apache 2.0 License, allowing commercial use, modification, and redistribution with attribution.

Downloads last month
40,635
GGUF
Model size
3B params
Architecture
deepseek_vl_v2
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for NexaAI/DeepSeek-OCR-GGUF

Quantized
(6)
this model