deepseek-ai/DeepSeek-OCR is out! š„ my take ā¤µļø > pretty insane it can parse and re-render charts in HTML > it uses CLIP and SAM features concatenated, so better grounding > very efficient per vision tokens/performance ratio > covers 100 languages
The POINTS-Reader, a vision-language model for end-to-end document conversion, is a powerful, distillation-free Vision-Language Model that sets new SoTA benchmarks. The demo is now available on HF (Extraction, Preview, Documentation). The input consists of a fixed prompt and a document image, while the output contains only a string (the text extracted from the document image). š„š¤