Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.10671

Papers - Encodings - Rotary - RoPE

The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168
Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 2
ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 2

Papers - Benchmarks - Coding

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Paper • 2404.03543 • Published Apr 4, 2024 • 18
McEval: Massively Multilingual Code Evaluation

Paper • 2406.07436 • Published Jun 11, 2024 • 41
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 48
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168

Papers - Fine-tuning - SFT

InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 34
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 89
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 12

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Papers - Multilingual

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Paper • 2304.08999 • Published Apr 18, 2023 • 3
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 85
Robust Open-Vocabulary Translation from Visual Text Representations

Paper • 2104.08211 • Published Apr 16, 2021 • 1
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 15

Chinese-English Translation Capable

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168

Papers - Benchmarks - GSM8k

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

Papers - Attention - Grouped-Query Attention (GQA)

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 127
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 11
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11, 2024 • 16

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 15

Papers - Encodings - Rotary - RoPE

The Impact of Positional Encoding on Length Generalization in Transformers

Paper • 2305.19466 • Published May 31, 2023 • 2
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168
Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8, 2024 • 2
ThunderKittens: Simple, Fast, and Adorable AI Kernels

Paper • 2410.20399 • Published Oct 27, 2024 • 2

Chinese-English Translation Capable

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168

Papers - Benchmarks - Coding

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

Paper • 2404.03543 • Published Apr 4, 2024 • 18
McEval: Massively Multilingual Code Evaluation

Paper • 2406.07436 • Published Jun 11, 2024 • 41
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 48
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168

Papers - Benchmarks - GSM8k

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

Paper • 2312.17080 • Published Dec 28, 2023 • 1
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11, 2024 • 29
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168
Self-Consistency Preference Optimization

Paper • 2411.04109 • Published Nov 6, 2024 • 19

Papers - Fine-tuning - SFT

InternLM2 Technical Report

Paper • 2403.17297 • Published Mar 26, 2024 • 34
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 89
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data

Paper • 2404.12195 • Published Apr 18, 2024 • 12

Papers - Attention - Grouped-Query Attention (GQA)

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 6
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18, 2024 • 39
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22, 2024 • 127
Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15, 2024 • 168

To read... eventually

A collection of papers that i have read or plan to read all in one place. Includes a wide range of topics.

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14, 2024 • 129
Evolutionary Optimization of Model Merging Recipes

Paper • 2403.13187 • Published Mar 19, 2024 • 58
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

Paper • 2402.03766 • Published Feb 6, 2024 • 15
LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25, 2024 • 72

Qwen Technical Report

Paper • 2309.16609 • Published Sep 28, 2023 • 37
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Paper • 2308.12966 • Published Aug 24, 2023 • 11
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

Paper • 2311.07919 • Published Nov 14, 2023 • 10
Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11, 2024 • 16

Papers - Multilingual

A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese

Paper • 2304.08999 • Published Apr 18, 2023 • 3
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages

Paper • 2309.09400 • Published Sep 17, 2023 • 85
Robust Open-Vocabulary Translation from Visual Text Representations

Paper • 2104.08211 • Published Apr 16, 2021 • 1
Poro 34B and the Blessing of Multilinguality

Paper • 2404.01856 • Published Apr 2, 2024 • 15

Papers - Training

SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length

Paper • 2310.00576 • Published Oct 1, 2023 • 2
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity

Paper • 2305.13169 • Published May 22, 2023 • 3
Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14, 2024 • 15

Previous
1
2
3
4
5
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs