-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
Collections
Discover the best community collections!
Collections including paper arxiv:2504.04842
-
mistralai/Mistral-7B-Instruct-v0.1
Text Generation • 7B • Updated • 520k • 1.81k -
FLUX.1 [dev]
🖥Featured9.25kGenerate images from text prompts
-
ufldl-stanford/svhn
Viewer • Updated • 879k • 39.4k • 15 -
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Paper • 2504.04842 • Published • 35
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
mistralai/Mistral-7B-Instruct-v0.1
Text Generation • 7B • Updated • 520k • 1.81k -
FLUX.1 [dev]
🖥Featured9.25kGenerate images from text prompts
-
ufldl-stanford/svhn
Viewer • Updated • 879k • 39.4k • 15 -
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
Paper • 2504.04842 • Published • 35