Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2305.13245

Collection of useful papers.

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 98
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 54
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 13
Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11

inference optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 9

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 98
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 17
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 247

Papers - Google

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 26
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 28
TC4D: Trajectory-Conditioned Text-to-4D Generation

Paper • 2403.17920 • Published Mar 26, 2024 • 18

Papers - Attention - Grouped-Query Attention (GQA)

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 167

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 27
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 98
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9

Papers - Optimizers - Adafactor

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 48
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Training a T5 Using Lab-sized Resources

Paper • 2208.12097 • Published Aug 25, 2022 • 1
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 6

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Collection of useful papers.

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 98
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 54
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 13
Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 23
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 1
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

inference optimization

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Paper • 2407.08608 • Published Jul 11, 2024 • 1
Fast Transformer Decoding: One Write-Head is All You Need

Paper • 1911.02150 • Published Nov 6, 2019 • 9

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 27
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 98
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 98
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 17
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 247

Papers - Optimizers - Adafactor

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6

Papers - Google

Lumiere: A Space-Time Diffusion Model for Video Generation

Paper • 2401.12945 • Published Jan 23, 2024 • 86
Long-form factuality in large language models

Paper • 2403.18802 • Published Mar 27, 2024 • 26
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 28
TC4D: Trajectory-Conditioned Text-to-4D Generation

Paper • 2403.17920 • Published Mar 26, 2024 • 18

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21, 2024 • 48
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Training a T5 Using Lab-sized Resources

Paper • 2208.12097 • Published Aug 25, 2022 • 1
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

Paper • 2212.05055 • Published Dec 9, 2022 • 6

Papers - Attention - Grouped-Query Attention (GQA)

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 126
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 167

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs