Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published 21 days ago • 117 • 4
Efficient Long-context Language Model Training by Core Attention Disaggregation Paper • 2510.18121 • Published 21 days ago • 117
Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs Paper • 2510.11062 • Published 28 days ago • 27
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing Paper • 2508.09192 • Published Aug 8 • 30
Scaling Speculative Decoding with Lookahead Reasoning Paper • 2506.19830 • Published Jun 24 • 12
Faster Video Diffusion with Trainable Sparse Attention Paper • 2505.13389 • Published May 19 • 37
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper • 2502.10248 • Published Feb 14 • 55
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile Paper • 2502.06155 • Published Feb 10 • 10
LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers Paper • 2310.03294 • Published Oct 5, 2023 • 2
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena Paper • 2306.05685 • Published Jun 9, 2023 • 37
Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks Paper • 2306.13103 • Published Jun 16, 2023 • 2
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Paper • 2401.09670 • Published Jan 18, 2024 • 2
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding Paper • 2402.02057 • Published Feb 3, 2024