Improve model card: Add pipeline tag, OCR/Qwen tags, HF paper link, and sample usage

This PR enhances the model card for `olmOCR-7B-0725-FP8` by:
- Adding `pipeline_tag: image-to-text` to improve discoverability for OCR tasks.
- Adding relevant `tags: ocr, qwen` to further categorize the model based on its functionality and base architecture.
- Updating the "Paper" link in the Quick links section to point to the official Hugging Face Papers page ([https://huggingface.co/papers/2510.19817](https://huggingface.co/papers/2510.19817)) for better integration.
- Updating the introductory description to reflect the context of the `olmOCR 2` paper.
- Introducing a "Sample Usage" section with a bash code snippet directly from the GitHub README to help users quickly get started with local inference.

These changes will make the model card more informative and user-friendly on the Hugging Face Hub.

Files changed (1) hide show

README.md +25 -9

README.md CHANGED Viewed

@@ -1,26 +1,29 @@
 ---
-language:
-- en
-license: apache-2.0
-datasets:
-- allenai/olmOCR-mix-0225
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
 library_name: transformers
 new_version: allenai/olmOCR-7B-0825-FP8
 ---
 <img alt="olmOCR Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmocr/olmocr.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
 # olmOCR-7B-0725-FP8
-Quantized to FP8 Version of [olmOCR-7B-0725](https://huggingface.co/allenai/olmOCR-7B-0725), using llmcompressor.
-This is a release of the olmOCR model that's fine tuned from Qwen2.5-VL-7B-Instruct using the
-[olmOCR-mix-0225](https://huggingface.co/datasets/allenai/olmOCR-mix-0225) dataset.
 Quick links:
-- 📃 [Paper](https://olmocr.allenai.org/papers/olmocr.pdf)
 - 🤗 [Dataset](https://huggingface.co/datasets/allenai/olmOCR-mix-0225)
 - 🛠️ [Code](https://github.com/allenai/olmocr)
 - 🎮 [Demo](https://olmocr.allenai.org/)
@@ -36,6 +39,19 @@ This model expects as input a single document image, rendered such that the long
 The prompt must then contain the additional metadata from the document, and the easiest way to generate this
 is to use the methods provided by the [olmOCR toolkit](https://github.com/allenai/olmocr).
 ## License and use

 ---
 base_model:
 - Qwen/Qwen2.5-VL-7B-Instruct
+datasets:
+- allenai/olmOCR-mix-0225
+language:
+- en
 library_name: transformers
+license: apache-2.0
 new_version: allenai/olmOCR-7B-0825-FP8
+pipeline_tag: image-to-text
+tags:
+- ocr
+- qwen
 ---
 <img alt="olmOCR Logo" src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/olmocr/olmocr.png" width="242px" style="margin-left:'auto' margin-right:'auto' display:'block'">
 # olmOCR-7B-0725-FP8
+This is a quantized (FP8) version of [olmOCR-7B-0725](https://huggingface.co/allenai/olmOCR-7B-0725), using llmcompressor. This model is part of the olmOCR family of powerful OCR systems for converting digitized print documents, like PDFs, into clean, naturally ordered plain text. The latest advancements in this family are presented in the paper [olmOCR 2: Unit Test Rewards for Document OCR](https://huggingface.co/papers/2510.19817).
+This olmOCR model is fine-tuned from Qwen2.5-VL-7B-Instruct using the [olmOCR-mix-0225](https://huggingface.co/datasets/allenai/olmOCR-mix-0225) dataset.
 Quick links:
+- 📃 [Paper](https://huggingface.co/papers/2510.19817)
 - 🤗 [Dataset](https://huggingface.co/datasets/allenai/olmOCR-mix-0225)
 - 🛠️ [Code](https://github.com/allenai/olmocr)
 - 🎮 [Demo](https://olmocr.allenai.org/)
 The prompt must then contain the additional metadata from the document, and the easiest way to generate this
 is to use the methods provided by the [olmOCR toolkit](https://github.com/allenai/olmocr).
+## Sample Usage
+For quick testing, try the [web demo](https://olmocr.allen.ai/). To run locally, a GPU is required. Here's an example of converting a single PDF using the `olmocr` pipeline, as provided in the GitHub repository:
+```bash
+# Download a sample PDF
+curl -o olmocr-sample.pdf https://olmocr.allenai.org/papers/olmocr_3pg_sample.pdf
+# Convert it to markdown
+python -m olmocr.pipeline ./localworkspace --markdown --pdfs olmocr-sample.pdf
+```
+Results will be stored as markdown files inside of `./localworkspace/markdown/`.
 ## License and use