Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.26692

Kimi-Linear-A3B

Moonshot's experimental MoE model with Kimi Delta Attention

moonshotai/Kimi-Linear-48B-A3B-Instruct

Text Generation • 49B • Updated 2 days ago • 15k • 319
moonshotai/Kimi-Linear-48B-A3B-Base

Text Generation • 49B • Updated 2 days ago • 182 • 40
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

LLM Architectures

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

Agentic AI Training and Tuning

Tongyi DeepResearch Technical Report

Paper • 2510.24701 • Published 6 days ago • 86
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 70
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 113
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22 • 52
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 103

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11 • 18
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31 • 6
Causal Attention with Lookahead Keys

Paper • 2509.07301 • Published Sep 9 • 21

about 6 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published 28 days ago • 462
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published about 1 month ago • 94
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published 26 days ago • 48
StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published 24 days ago • 49

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 61
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Kimi-Linear-A3B

Moonshot's experimental MoE model with Kimi Delta Attention

moonshotai/Kimi-Linear-48B-A3B-Instruct

Text Generation • 49B • Updated 2 days ago • 15k • 319
moonshotai/Kimi-Linear-48B-A3B-Base

Text Generation • 49B • Updated 2 days ago • 182 • 40
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

about 6 hours ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

LLM Architectures

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

Agentic AI Training and Tuning

Tongyi DeepResearch Technical Report

Paper • 2510.24701 • Published 6 days ago • 86
Kimi Linear: An Expressive, Efficient Attention Architecture

Paper • 2510.26692 • Published 4 days ago • 78

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published 28 days ago • 462
Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Paper • 2510.03215 • Published about 1 month ago • 94
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs

Paper • 2510.07499 • Published 26 days ago • 48
StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published 24 days ago • 49

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 70
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 113
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22 • 52
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 103

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 61
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 298
Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11 • 18
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31 • 6
Causal Attention with Lookahead Keys

Paper • 2509.07301 • Published Sep 9 • 21

Selective Attention Improves Transformer

Paper • 2410.02703 • Published Oct 3, 2024 • 24
Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Paper • 2410.05076 • Published Oct 7, 2024 • 8
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs

Paper • 2410.13276 • Published Oct 17, 2024 • 29

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs