-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 85 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
leliuga/Phi-3-mini-4k-instruct-bnb-4bit
Text Generation • 2B • Updated • 49 • 5 -
solidrust/Phi-3-mini-4k-instruct-AWQ
Text Generation • 0.7B • Updated • 228
Collections
Discover the best community collections!
Collections including paper arxiv:2504.11651
-
deepseek-ai/DeepSeek-V3-0324
Text Generation • 685B • Updated • 190k • • 3.08k -
OuteAI/Llama-OuteTTS-1.0-1B
Text-to-Speech • 1B • Updated • 77.8k • 234 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 258
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
Efficient Agents: Building Effective Agents While Reducing Cost
Paper • 2508.02694 • Published • 85 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
leliuga/Phi-3-mini-4k-instruct-bnb-4bit
Text Generation • 2B • Updated • 49 • 5 -
solidrust/Phi-3-mini-4k-instruct-AWQ
Text Generation • 0.7B • Updated • 228
-
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Paper • 2507.04404 • Published • 21 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31 -
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Paper • 2505.12781 • Published • 2 -
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 258
-
deepseek-ai/DeepSeek-V3-0324
Text Generation • 685B • Updated • 190k • • 3.08k -
OuteAI/Llama-OuteTTS-1.0-1B
Text-to-Speech • 1B • Updated • 77.8k • 234 -
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 48 -
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Paper • 2504.11651 • Published • 31
-
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression
Paper • 2501.16372 • Published • 12 -
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Paper • 2501.16937 • Published • 7 -
Matryoshka Quantization
Paper • 2502.06786 • Published • 32 -
Identifying Sensitive Weights via Post-quantization Integral
Paper • 2503.01901 • Published • 8
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 30 -
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Paper • 2411.05005 • Published • 13 -
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models
Paper • 2411.04075 • Published • 17 -
Self-Consistency Preference Optimization
Paper • 2411.04109 • Published • 19
-
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper • 2409.02097 • Published • 34 -
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
Paper • 2409.11406 • Published • 27 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Segment Anything with Multiple Modalities
Paper • 2408.09085 • Published • 22
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 625 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 105 -
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
Paper • 2404.02258 • Published • 107 -
TransformerFAM: Feedback attention is working memory
Paper • 2404.09173 • Published • 43