Collections
Discover the best community collections!
Collections including paper arxiv:2306.11644
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 96
-
Attention Is All You Need
Paper • 1706.03762 • Published • 96 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
Visual In-Context Prompting
Paper • 2311.13601 • Published • 19 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 10 -
LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Paper • 2303.02927 • Published • 3 -
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Paper • 2311.07361 • Published • 14
-
Detecting Pretraining Data from Large Language Models
Paper • 2310.16789 • Published • 11 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 241 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
HuggingFaceH4/zephyr-7b-beta
Text Generation • 7B • Updated • 268k • • 1.81k -
Intel/neural-chat-7b-v3-1
Text Generation • 7B • Updated • 2.3k • • 546 -
google-bert/bert-large-cased-whole-word-masking
Fill-Mask • 0.3B • Updated • 1.38k • 22 -
google-bert/bert-large-uncased-whole-word-masking
Fill-Mask • 0.3B • Updated • 9.65k • • 21
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4
-
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 247 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 241 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260
-
Lost in the Middle: How Language Models Use Long Contexts
Paper • 2307.03172 • Published • 43 -
Efficient Estimation of Word Representations in Vector Space
Paper • 1301.3781 • Published • 8 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
Attention Is All You Need
Paper • 1706.03762 • Published • 96
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7 -
Gemini: A Family of Highly Capable Multimodal Models
Paper • 2312.11805 • Published • 47
-
Attention Is All You Need
Paper • 1706.03762 • Published • 96 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
HuggingFaceH4/zephyr-7b-beta
Text Generation • 7B • Updated • 268k • • 1.81k -
Intel/neural-chat-7b-v3-1
Text Generation • 7B • Updated • 2.3k • • 546 -
google-bert/bert-large-cased-whole-word-masking
Fill-Mask • 0.3B • Updated • 1.38k • 22 -
google-bert/bert-large-uncased-whole-word-masking
Fill-Mask • 0.3B • Updated • 9.65k • • 21
-
Visual In-Context Prompting
Paper • 2311.13601 • Published • 19 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 10 -
LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Paper • 2303.02927 • Published • 3 -
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Paper • 2311.07361 • Published • 14
-
Detecting Pretraining Data from Large Language Models
Paper • 2310.16789 • Published • 11 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
AutoMix: Automatically Mixing Language Models
Paper • 2310.12963 • Published • 14 -
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Paper • 2310.12962 • Published • 13
-
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Paper • 2310.13961 • Published • 5 -
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Paper • 2202.07922 • Published • 1 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper • 2310.13671 • Published • 19 -
Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs
Paper • 2309.09582 • Published • 4