letxbe
/

DocExplainer

Visual Question Answering

Visual-Question-Answering

Question-Answering

Model card Files Files and versions

andreagemelli commited on Sep 20

Commit

8c49cda

·

verified ·

1 Parent(s): 645c2e1

update licence

Files changed (1) hide show

README.md +1 -7

README.md CHANGED Viewed

@@ -16,21 +16,15 @@ license: apache-2.0
 <h1>DocExplainer: Document VQA with Bounding Box Localization</h1>
-[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
-<!-- [![arXiv](https://img.shields.io/badge/arXiv-2501.03403-b31b1b.svg)]() -->
-[![HuggingFace](https://img.shields.io/badge/🤗%20Hugging%20Face-Datasets-yellow)](https://huggingface.co/letxbe/DocExplainer)
 </div>
-## Model description
 DocExplainer is a an approach to Document Visual Question Answering (Document VQA) with bounding box localization.
 Unlike standard VLMs that only provide text-based answers, DocExplainer adds **visual evidence through bounding boxes**, making model predictions more interpretable.
 It is designed as a **plug-and-play module** to be combined with existing Vision-Language Models (VLMs), decoupling answer generation from spatial grounding.
 - **Authors:** Alessio Chen, Simone Giovannini, Andrea Gemelli, Fabio Coppini, Simone Marinai
 - **Affiliations:**  [Letxbe AI](https://letxbe.ai/), [University of Florence](https://www.unifi.it/it)
-- **License:** CC-BY-4.0
 - **Paper:** ["Towards Reliable and Interpretable Document Question Answering via VLMs"](https://arxiv.org/abs/2509.10129) by Alessio Chen et al.
 <div align="center">

 <h1>DocExplainer: Document VQA with Bounding Box Localization</h1>
 </div>
 DocExplainer is a an approach to Document Visual Question Answering (Document VQA) with bounding box localization.
 Unlike standard VLMs that only provide text-based answers, DocExplainer adds **visual evidence through bounding boxes**, making model predictions more interpretable.
 It is designed as a **plug-and-play module** to be combined with existing Vision-Language Models (VLMs), decoupling answer generation from spatial grounding.
 - **Authors:** Alessio Chen, Simone Giovannini, Andrea Gemelli, Fabio Coppini, Simone Marinai
 - **Affiliations:**  [Letxbe AI](https://letxbe.ai/), [University of Florence](https://www.unifi.it/it)
+- **License:** apache-2.0
 - **Paper:** ["Towards Reliable and Interpretable Document Question Answering via VLMs"](https://arxiv.org/abs/2509.10129) by Alessio Chen et al.
 <div align="center">