Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers Paper • 2405.05945 • Published May 9, 2024 • 4
InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models Paper • 2404.07940 • Published Mar 11, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Paper • 2402.05935 • Published Feb 8, 2024 • 17
Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs Paper • 2309.15940 • Published Sep 27, 2023 • 1
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model Paper • 2304.15010 • Published Apr 28, 2023 • 4
Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5) Paper • 2203.13366 • Published Mar 24, 2022
Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens Paper • 2303.14865 • Published Mar 27, 2023 • 1