-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2206.07682
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 14 -
Language Model Evaluation Beyond Perplexity
Paper • 2106.00085 • Published
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 100
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Scaling Laws for Precision
Paper • 2411.04330 • Published • 8 -
Transcending Scaling Laws with 0.1% Extra Compute
Paper • 2210.11399 • Published
-
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Paper • 2304.13712 • Published -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
A Comprehensive Overview of Large Language Models
Paper • 2307.06435 • Published • 2
-
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 1 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Scaling Laws for Precision
Paper • 2411.04330 • Published • 8 -
Transcending Scaling Laws with 0.1% Extra Compute
Paper • 2210.11399 • Published
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 14 -
Language Model Evaluation Beyond Perplexity
Paper • 2106.00085 • Published
-
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Paper • 2304.13712 • Published -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 19 -
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
A Comprehensive Overview of Large Language Models
Paper • 2307.06435 • Published • 2
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 100
-
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 17 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 -
Orca 2: Teaching Small Language Models How to Reason
Paper • 2311.11045 • Published • 77