-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2411.00027
-
Reinforcement Learning for Long-Horizon Interactive LLM Agents
Paper • 2502.01600 • Published • 1 -
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
Paper • 2311.06318 • Published • 3 -
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Paper • 2502.12130 • Published • 2 -
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper • 2502.02584 • Published • 17
-
We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation
Paper • 2406.10561 • Published • 1 -
AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design
Paper • 2405.03680 • Published • 1 -
ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data
Paper • 2209.08203 • Published • 1 -
SeaLLMs -- Large Language Models for Southeast Asia
Paper • 2312.00738 • Published • 25
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 190 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 9 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Reinforcement Learning for Long-Horizon Interactive LLM Agents
Paper • 2502.01600 • Published • 1 -
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
Paper • 2311.06318 • Published • 3 -
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Paper • 2502.12130 • Published • 2 -
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Paper • 2502.02584 • Published • 17
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 9 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 48 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 72 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines
Paper • 2408.01050 • Published • 9 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance
Paper • 2409.04593 • Published • 26
-
We Care: Multimodal Depression Detection and Knowledge Infused Mental Health Therapeutic Response Generation
Paper • 2406.10561 • Published • 1 -
AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design
Paper • 2405.03680 • Published • 1 -
ChemNLP: A Natural Language Processing based Library for Materials Chemistry Text Data
Paper • 2209.08203 • Published • 1 -
SeaLLMs -- Large Language Models for Southeast Asia
Paper • 2312.00738 • Published • 25
-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 14 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48
-
DocGraphLM: Documental Graph Language Model for Information Extraction
Paper • 2401.02823 • Published • 36 -
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 190 -
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Paper • 2309.01131 • Published • 1