No Tokens Wasted: Leveraging Long Context in Biomedical Vision-Language Models Paper β’ 2510.03978 β’ Published about 1 month ago β’ 2
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published Dec 13, 2024 β’ 147
ModernVBERT: Towards Smaller Visual Document Retrievers Paper β’ 2510.01149 β’ Published Oct 1 β’ 30
view article Article NVIDIA Releases 3 Million Sample Dataset for OCR, Visual Question Answering, and Captioning Tasks By nvidia and 4 others β’ Aug 11 β’ 74
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports Paper β’ 2505.11733 β’ Published May 16 β’ 7
SmolVLM: Redefining small and efficient multimodal models Paper β’ 2504.05299 β’ Published Apr 7 β’ 200
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper β’ 2503.13399 β’ Published Mar 17 β’ 22
SmolVLM Collection State-of-the-art compact VLMs for on-device applications: Base, Synthetic, and Instruct. Check our blog: https://huggingface.co/blog/smolvlm β’ 5 items β’ Updated May 5 β’ 39
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature Paper β’ 2501.07171 β’ Published Jan 13 β’ 55
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper β’ 2410.12628 β’ Published Oct 16, 2024 β’ 41
HyenaDNA Models Collection HyenaDNA models usable directly with Hugging Face classes like AutoModel. β’ 8 items β’ Updated Nov 14, 2023 β’ 19
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records Paper β’ 2406.16341 β’ Published Jun 24, 2024 β’ 14