Zhongzhi Yu
kevin1020
AI & ML interests
Efficient LLM Inference and Tuning
Organizations
Prompting
LLM Agents
-
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Paper • 2401.12474 • Published • 36 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118
Efficient Tuning
Efficient VLM via Image Token Compression
-
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Paper • 2403.06764 • Published • 28 -
TokenPacker: Efficient Visual Projector for Multimodal LLM
Paper • 2407.02392 • Published • 24 -
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Paper • 2407.18121 • Published • 17 -
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Paper • 2411.05222 • Published • 2
Long Context
-
Extending Llama-3's Context Ten-Fold Overnight
Paper • 2404.19553 • Published • 34 -
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
Paper • 2407.08454 • Published -
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Paper • 2409.01071 • Published • 27 -
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models
Paper • 2409.02076 • Published • 12
Visualizations
-
Not All Language Model Features Are Linear
Paper • 2405.14860 • Published • 41 -
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Paper • 2410.02707 • Published • 48 -
RepVideo: Rethinking Cross-Layer Representation for Video Generation
Paper • 2501.08994 • Published • 15
PEFT
Modular
Efficient LLM
RAG
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
REST: Retrieval-Based Speculative Decoding
Paper • 2311.08252 • Published -
Active Retrieval Augmented Generation
Paper • 2305.06983 • Published • 3 -
Retrieval-Augmented Generation for Large Language Models: A Survey
Paper • 2312.10997 • Published • 12
Inference Acceleration
-
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Paper • 2401.12522 • Published • 12 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 50 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 17
Code Generation
-
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 43 -
Code Representation Learning At Scale
Paper • 2402.01935 • Published • 13 -
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Paper • 2406.11612 • Published • 25 -
Agentless: Demystifying LLM-based Software Engineering Agents
Paper • 2407.01489 • Published • 64
Token Compression
VLM
-
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 11 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 14
Reasoning
-
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Paper • 2406.06592 • Published • 29 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Forward tuning
ViT
Benchmarks
Data
RAG
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
REST: Retrieval-Based Speculative Decoding
Paper • 2311.08252 • Published -
Active Retrieval Augmented Generation
Paper • 2305.06983 • Published • 3 -
Retrieval-Augmented Generation for Large Language Models: A Survey
Paper • 2312.10997 • Published • 12
Prompting
Inference Acceleration
-
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Paper • 2401.12522 • Published • 12 -
Hydragen: High-Throughput LLM Inference with Shared Prefixes
Paper • 2402.05099 • Published • 20 -
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper • 2402.04291 • Published • 50 -
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
Paper • 2402.02834 • Published • 17
LLM Agents
-
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Paper • 2401.12474 • Published • 36 -
More Agents Is All You Need
Paper • 2402.05120 • Published • 57 -
VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Paper • 2403.10517 • Published • 37 -
Octopus v4: Graph of language models
Paper • 2404.19296 • Published • 118
Code Generation
-
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Paper • 2402.01391 • Published • 43 -
Code Representation Learning At Scale
Paper • 2402.01935 • Published • 13 -
Long Code Arena: a Set of Benchmarks for Long-Context Code Models
Paper • 2406.11612 • Published • 25 -
Agentless: Demystifying LLM-based Software Engineering Agents
Paper • 2407.01489 • Published • 64
Efficient Tuning
Token Compression
Efficient VLM via Image Token Compression
-
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Paper • 2403.06764 • Published • 28 -
TokenPacker: Efficient Visual Projector for Multimodal LLM
Paper • 2407.02392 • Published • 24 -
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Paper • 2407.18121 • Published • 17 -
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Paper • 2411.05222 • Published • 2
VLM
-
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 11 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 36 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 14
Long Context
-
Extending Llama-3's Context Ten-Fold Overnight
Paper • 2404.19553 • Published • 34 -
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
Paper • 2407.08454 • Published -
VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Paper • 2409.01071 • Published • 27 -
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models
Paper • 2409.02076 • Published • 12
Reasoning
-
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Paper • 2404.12253 • Published • 55 -
Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Paper • 2406.06592 • Published • 29 -
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B
Paper • 2406.07394 • Published • 29
Visualizations
-
Not All Language Model Features Are Linear
Paper • 2405.14860 • Published • 41 -
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
Paper • 2410.02707 • Published • 48 -
RepVideo: Rethinking Cross-Layer Representation for Video Generation
Paper • 2501.08994 • Published • 15
Forward tuning
PEFT
ViT
Modular
Benchmarks
Efficient LLM