-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 65 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85
Collections
Discover the best community collections!
Collections including paper arxiv:2402.00838
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105
-
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
Holistic Evaluation of Text-To-Image Models
Paper • 2311.04287 • Published • 16 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15 -
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 12 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 16
-
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper • 2401.02038 • Published • 65 -
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Paper • 2402.00159 • Published • 65 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85
-
Attention Is All You Need
Paper • 1706.03762 • Published • 100 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
Holistic Evaluation of Text-To-Image Models
Paper • 2311.04287 • Published • 16 -
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Paper • 2311.07463 • Published • 15 -
Trusted Source Alignment in Large Language Models
Paper • 2311.06697 • Published • 12 -
DiLoCo: Distributed Low-Communication Training of Language Models
Paper • 2311.08105 • Published • 16
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper • 2309.12307 • Published • 89 -
LMDX: Language Model-based Document Information Extraction and Localization
Paper • 2309.10952 • Published • 66 -
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper • 2310.09263 • Published • 41 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105