VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 75
Replacing thinking with tool usage enables reasoning in small language models Paper • 2507.05065 • Published Jul 7 • 15
Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling Paper • 2507.11061 • Published Jul 15 • 37
Coding Triangle: How Does Large Language Model Understand Code? Paper • 2507.06138 • Published Jul 8 • 21
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents Paper • 2507.04590 • Published Jul 7 • 16
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published May 27 • 38
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer Paper • 2505.04622 • Published May 7 • 27
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 468
Phi-4 (All Versions) Collection Microsoft's Phi-4 models including Reasoning + Reasoning Plus & mini. Includes Dynamic 2.0 GGUF, 4-bit & 16-bit versions. Includes Unsloth's bug fixes • 20 items • Updated 12 days ago • 74
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 173
DeepSeek R1 (All Versions) Collection DeepSeek-R1-0528 is here! The most powerful reasoning open LLM, available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 37 items • Updated 12 days ago • 259
Llama 3.2 Collection Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 27 items • Updated 12 days ago • 68
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era By MiniMax-AI • Jan 15 • 48