multimodal - a fnauman Collection

fnauman 's Collections

edge

multimodal

updated 6 days ago

vikhyatk/moondream2

Image-Text-to-Text • 2B • Updated Sep 23 • 1.54M • 1.34k
Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6 • 4.22M • • 1.36k
google/gemma-3-27b-it-qat-q4_0-gguf

Image-Text-to-Text • 27B • Updated Apr 11 • 7.12k • 360
google/paligemma2-3b-mix-224

Image-Text-to-Text • 3B • Updated Feb 7 • 8.84k • 39
HuggingFaceTB/SmolVLM2-256M-Video-Instruct

Image-Text-to-Text • 0.3B • Updated Apr 8 • 19.6k • 83
unsloth/Qwen2.5-VL-3B-Instruct-GGUF

Image-Text-to-Text • 3B • Updated May 12 • 7.52k • 17
OpenGVLab/InternVL3-1B

Image-Text-to-Text • 0.9B • Updated Sep 11 • 92.8k • 75
BLIP3o/BLIP3o-Model-8B

14B • Updated Jun 4 • 1.04k • 102
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 72
jinaai/jina-clip-v2

Feature Extraction • 0.9B • Updated Apr 28 • 334k • 290
Qwen/Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Oct 15 • 707k • 239
PaddlePaddle/PaddleOCR-VL

Image-Text-to-Text • 1.0B • Updated 12 days ago • 31.6k • 1.36k
PerceptronAI/Isaac-0.1

Text Generation • 3B • Updated Oct 9 • 3.27k • 108
moondream/refcoco-m

Viewer • Updated 8 days ago • 1.19k • 35.9k • 43