Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2312.04333

Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 44
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 30
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77

Beyond Surface: Probing LLaMA Across Scales and Layers

Paper • 2312.04333 • Published Dec 7, 2023 • 20
SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Paper • 2402.00854 • Published Feb 1, 2024 • 22
TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Paper • 2402.01622 • Published Feb 2, 2024 • 37
Do Large Language Models Latently Perform Multi-Hop Reasoning?

Paper • 2402.16837 • Published Feb 26, 2024 • 29

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 24
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 12

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15
Beyond Surface: Probing LLaMA Across Scales and Layers

Paper • 2312.04333 • Published Dec 7, 2023 • 20
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Paper • 2312.03849 • Published Dec 6, 2023 • 7
wikimedia/wikipedia

Viewer • Updated Jan 9, 2024 • 61.6M • 59k • 974

LLM architecture

The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 10
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 172
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 19
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 96

Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 44
Fine-tuning Language Models for Factuality

Paper • 2311.08401 • Published Nov 14, 2023 • 30
Orca 2: Teaching Small Language Models How to Reason

Paper • 2311.11045 • Published Nov 18, 2023 • 77

Pearl: A Production-ready Reinforcement Learning Agent

Paper • 2312.03814 • Published Dec 6, 2023 • 15
Beyond Surface: Probing LLaMA Across Scales and Layers

Paper • 2312.04333 • Published Dec 7, 2023 • 20
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Paper • 2312.03849 • Published Dec 6, 2023 • 7
wikimedia/wikipedia

Viewer • Updated Jan 9, 2024 • 61.6M • 59k • 974

Beyond Surface: Probing LLaMA Across Scales and Layers

Paper • 2312.04333 • Published Dec 7, 2023 • 20
SymbolicAI: A framework for logic-based approaches combining generative models and solvers

Paper • 2402.00854 • Published Feb 1, 2024 • 22
TravelPlanner: A Benchmark for Real-World Planning with Language Agents

Paper • 2402.01622 • Published Feb 2, 2024 • 37
Do Large Language Models Latently Perform Multi-Hop Reasoning?

Paper • 2402.16837 • Published Feb 26, 2024 • 29

LLM architecture

The Impact of Depth and Width on Transformer Language Model Generalization

Paper • 2310.19956 • Published Oct 30, 2023 • 10
Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 172
RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 19
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 96

MADLAD-400: A Multilingual And Document-Level Large Audited Dataset

Paper • 2309.04662 • Published Sep 9, 2023 • 24
Neurons in Large Language Models: Dead, N-gram, Positional

Paper • 2309.04827 • Published Sep 9, 2023 • 17
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs

Paper • 2309.05516 • Published Sep 11, 2023 • 10
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs

Paper • 2309.03907 • Published May 18, 2023 • 12

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs