A Survey on Hypothesis Generation for Scientific Discovery in the Era of Large Language Models Paper • 2504.05496 • Published Apr 7
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models Paper • 2505.03821 • Published May 3 • 25
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient Paper • 2502.05172 • Published Feb 7 • 2
Mixture of Tokens: Efficient LLMs through Cross-Example Aggregation Paper • 2310.15961 • Published Oct 24, 2023 • 1
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27, 2024 • 144
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 73
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper • 2401.04081 • Published Jan 8, 2024 • 73