-
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks
Paper • 2412.15605 • Published • 2 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Paper • 2403.14403 • Published • 7 -
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 35
Collections
Discover the best community collections!
Collections including paper arxiv:2401.12954
-
Attention Is All You Need
Paper • 1706.03762 • Published • 99 -
Self-Attention with Relative Position Representations
Paper • 1803.02155 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33
-
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 14 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper • 2402.16840 • Published • 26 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33
-
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33 -
Learning Universal Predictors
Paper • 2401.14953 • Published • 22 -
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 37 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 29
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31
-
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks
Paper • 2412.15605 • Published • 2 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Paper • 2403.14403 • Published • 7 -
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 35
-
Attention Is All You Need
Paper • 1706.03762 • Published • 99 -
Self-Attention with Relative Position Representations
Paper • 1803.02155 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33
-
AtP*: An efficient and scalable method for localizing LLM behaviour to components
Paper • 2403.00745 • Published • 14 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 627 -
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
Paper • 2402.16840 • Published • 26 -
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
Paper • 2402.13753 • Published • 116
-
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 60 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper • 2401.10774 • Published • 59 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33
-
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper • 2401.12954 • Published • 33 -
Learning Universal Predictors
Paper • 2401.14953 • Published • 22 -
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 37 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 29
-
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
3D-LFM: Lifting Foundation Model
Paper • 2312.11894 • Published • 15 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 60 -
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
Paper • 2312.16862 • Published • 31