MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision Paper • 2308.16139 • Published Aug 30, 2023
PulseCheck457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models Paper • 2502.08636 • Published Feb 12
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45
Masked Autoencoders Enable Efficient Knowledge Distillers Paper • 2208.12256 • Published Aug 25, 2022
Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification Paper • 2210.12843 • Published Oct 23, 2022
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection Paper • 2301.00785 • Published Jan 2, 2023
What If We Recaption Billions of Web Images with LLaMA-3? Paper • 2406.08478 • Published Jun 12, 2024 • 41
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter Paper • 2402.10896 • Published Feb 16, 2024 • 16
Rejuvenating image-GPT as Strong Visual Representation Learners Paper • 2312.02147 • Published Dec 4, 2023 • 7