-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2404.14619
-
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 190
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82
-
Attention Is All You Need
Paper • 1706.03762 • Published • 99 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 95 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 68 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 190 -
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 20 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 18
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 203k • 3.19k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 53 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 31
-
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
Rethinking Interpretability in the Era of Large Language Models
Paper • 2402.01761 • Published • 23 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117 -
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Paper • 2402.07827 • Published • 48
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Paper • 2402.01739 • Published • 28 -
LLM Agent Operating System
Paper • 2403.16971 • Published • 72 -
Poro 34B and the Blessing of Multilinguality
Paper • 2404.01856 • Published • 15
-
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper • 2403.09611 • Published • 129 -
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Paper • 2403.03507 • Published • 190
-
TinyLlama: An Open-Source Small Language Model
Paper • 2401.02385 • Published • 95 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 48 -
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper • 2401.15024 • Published • 74 -
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Paper • 2401.16380 • Published • 50
-
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Paper • 2401.01325 • Published • 27 -
Improving Text Embeddings with Large Language Models
Paper • 2401.00368 • Published • 82
-
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper • 2401.01055 • Published • 55 -
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Paper • 2401.01335 • Published • 68 -
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 190 -
Multilingual Instruction Tuning With Just a Pinch of Multilinguality
Paper • 2401.01854 • Published • 11
-
Attention Is All You Need
Paper • 1706.03762 • Published • 99 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 23 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 21
-
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Paper • 2312.13964 • Published • 20 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 260 -
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 74 -
LLaVA-φ: Efficient Multi-Modal Assistant with Small Language Model
Paper • 2401.02330 • Published • 18
-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 11 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
-
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 119 -
stabilityai/stable-video-diffusion-img2vid-xt
Image-to-Video • Updated • 203k • 3.19k -
LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Paper • 2311.13384 • Published • 53 -
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Paper • 2311.12454 • Published • 31