-
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Paper • 2509.01215 • Published • 50 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis
Paper • 2509.09595 • Published • 48 -
Reconstruction Alignment Improves Unified Multimodal Models
Paper • 2509.07295 • Published • 40
SiyuanYin
SiyuanYin
AI & ML interests
None yet
Organizations
None yet
Multi-Modal
-
POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion
Paper • 2509.01215 • Published • 50 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 83 -
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis
Paper • 2509.09595 • Published • 48 -
Reconstruction Alignment Improves Unified Multimodal Models
Paper • 2509.07295 • Published • 40
Video
models
0
None public yet
datasets
0
None public yet