Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models Paper • 2303.06571 • Published Mar 12, 2023
Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters Paper • 1910.11831 • Published Oct 25, 2019
Continual Vision-Language Representation Learning with Off-Diagonal Information Paper • 2305.07437 • Published May 11, 2023
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data Paper • 2311.13614 • Published Nov 22, 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models Paper • 2306.08641 • Published Jun 14, 2023
Rectifying the Shortcut Learning of Background for Few-Shot Learning Paper • 2107.07746 • Published Jul 16, 2021
Incorporating Visual Experts to Resolve the Information Loss in Multimodal Large Language Models Paper • 2401.03105 • Published Jan 6, 2024 • 2
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts Paper • 2410.15732 • Published Oct 21, 2024
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Paper • 2511.19365 • Published 20 days ago • 63
EMMA: Efficient Multimodal Understanding, Generation, and Editing with a Unified Architecture Paper • 2512.04810 • Published 10 days ago • 25
Person Transfer GAN to Bridge Domain Gap for Person Re-Identification Paper • 1711.08565 • Published Nov 23, 2017