-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
Collections
Discover the best community collections!
Collections including paper arxiv:2312.04333
-
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper • 2312.04333 • Published • 20 -
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Paper • 2402.00854 • Published • 22 -
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 37 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 29
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12
-
Pearl: A Production-ready Reinforcement Learning Agent
Paper • 2312.03814 • Published • 15 -
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper • 2312.04333 • Published • 20 -
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Paper • 2312.03849 • Published • 7 -
wikimedia/wikipedia
Viewer • Updated • 61.6M • 59k • 974
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 172 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 96
-
Memory Augmented Language Models through Mixture of Word Experts
Paper • 2311.10768 • Published • 19 -
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 44 -
Fine-tuning Language Models for Factuality
Paper • 2311.08401 • Published • 30 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
-
Pearl: A Production-ready Reinforcement Learning Agent
Paper • 2312.03814 • Published • 15 -
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper • 2312.04333 • Published • 20 -
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Paper • 2312.03849 • Published • 7 -
wikimedia/wikipedia
Viewer • Updated • 61.6M • 59k • 974
-
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper • 2312.04333 • Published • 20 -
SymbolicAI: A framework for logic-based approaches combining generative models and solvers
Paper • 2402.00854 • Published • 22 -
TravelPlanner: A Benchmark for Real-World Planning with Language Agents
Paper • 2402.01622 • Published • 37 -
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Paper • 2402.16837 • Published • 29
-
The Impact of Depth and Width on Transformer Language Model Generalization
Paper • 2310.19956 • Published • 10 -
Retentive Network: A Successor to Transformer for Large Language Models
Paper • 2307.08621 • Published • 172 -
RWKV: Reinventing RNNs for the Transformer Era
Paper • 2305.13048 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 96
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 24 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12