-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2407.10671
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 18 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41 -
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Paper • 2406.15877 • Published • 48 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72
-
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Paper • 2304.08999 • Published • 3 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Robust Open-Vocabulary Translation from Visual Text Representations
Paper • 2104.08211 • Published • 1 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 39 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 127 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 16
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168 -
Round and Round We Go! What makes Rotary Positional Encodings useful?
Paper • 2410.06205 • Published • 2 -
ThunderKittens: Simple, Fast, and Adorable AI Kernels
Paper • 2410.20399 • Published • 2
-
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models
Paper • 2404.03543 • Published • 18 -
McEval: Massively Multilingual Code Evaluation
Paper • 2406.07436 • Published • 41 -
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Paper • 2406.15877 • Published • 48 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168
-
Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Paper • 2312.17080 • Published • 1 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
InternLM2 Technical Report
Paper • 2403.17297 • Published • 34 -
sDPO: Don't Use Your Data All at Once
Paper • 2403.19270 • Published • 41 -
Learn Your Reference Model for Real Good Alignment
Paper • 2404.09656 • Published • 89 -
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data
Paper • 2404.12195 • Published • 12
-
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 6 -
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
Paper • 2404.12387 • Published • 39 -
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
Paper • 2404.14619 • Published • 127 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 168
-
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
Evolutionary Optimization of Model Merging Recipes
Paper • 2403.13187 • Published • 58 -
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Paper • 2402.03766 • Published • 15 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72
-
Qwen Technical Report
Paper • 2309.16609 • Published • 37 -
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper • 2308.12966 • Published • 11 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 10 -
Audio Dialogues: Dialogues dataset for audio and music understanding
Paper • 2404.07616 • Published • 16
-
A Biomedical Entity Extraction Pipeline for Oncology Health Records in Portuguese
Paper • 2304.08999 • Published • 3 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 85 -
Robust Open-Vocabulary Translation from Visual Text Representations
Paper • 2104.08211 • Published • 1 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
-
SELF: Language-Driven Self-Evolution for Large Language Model
Paper • 2310.00533 • Published • 2 -
GrowLength: Accelerating LLMs Pretraining by Progressively Growing Training Length
Paper • 2310.00576 • Published • 2 -
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Paper • 2305.13169 • Published • 3 -
Transformers Can Achieve Length Generalization But Not Robustly
Paper • 2402.09371 • Published • 15