Collection - a Hell12345 Collection

Hell12345 's Collections

Collection

updated 11 days ago

MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19 • 54
Qwen/Qwen3-Omni-30B-A3B-Instruct

Any-to-Any • 35B • Updated Sep 22 • 272k • 726
facebook/dinov3-vitb16-pretrain-lvd1689m

Image Feature Extraction • 85.7M • Updated Aug 19 • 363k • 82
nvidia/NV-Embed-v2

Feature Extraction • 8B • Updated Jul 21 • 207k • 482
zju-community/matchanything_eloftr

16.1M • Updated Aug 21 • 5.15k • 79
Running on Zero

MCP

19

DINOv3

🦖

19

Similarity, Classification
Running

17

DINOv3 Web/Sat Interactive Similarity

🦖

17

Visualize image patch similarity like in DINOv3 presentation
Running

615

Sheets

🗂

615

Create and enrich datasets with AI
FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 72
Running

Featured

401

FastVLM WebGPU

🍎

401

Real-time video captioning powered by FastVLM
ByteDance/lynx

Image-to-Video • Updated Sep 27 • • 134
Running on Zero

Featured

18

Talk2DINO

💻

18

Demo of Talk2DINO, model presented at ICCV 2025.
facebook/vjepa2-vitl-fpc64-256

Video Classification • 0.3B • Updated Aug 11 • 94.1k • 162
Running on Zero

63

Yoloe

🚀

63

Detect and segment objects in images using text, visual, or prompt-free prompts
Running on Zero

Featured

172

Distill Any Depth

💻

172

Generate depth maps from images
Running

Featured

248

Jupyter Agent 2

🏃

248

Run code and analyze data in a Jupyter notebook
Running on Zero

Featured

338

Describe Anything

⚡

338

Describe masked parts of images
amd/Nitro-E

Text-to-Image • Updated 22 days ago • 5.02k • 86
dx8152/Qwen-Edit-2509-Multiple-angles

Image-to-Image • Updated 13 days ago • 86.3k • • 737