VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published May 29 • 55
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models Paper • 2503.07605 • Published Mar 10 • 68
NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices Paper • 2403.10425 • Published Mar 15, 2024 • 4
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting Paper • 2403.09981 • Published Mar 15, 2024 • 8
FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model Paper • 2403.10242 • Published Mar 15, 2024 • 12
EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba Paper • 2403.09977 • Published Mar 15, 2024 • 11
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding Paper • 2403.10395 • Published Mar 15, 2024 • 9
Recurrent Drafter for Fast Speculative Decoding in Large Language Models Paper • 2403.09919 • Published Mar 14, 2024 • 22
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations Paper • 2403.09704 • Published Mar 8, 2024 • 33
RAFT: Adapting Language Model to Domain Specific RAG Paper • 2403.10131 • Published Mar 15, 2024 • 72
VideoAgent: Long-form Video Understanding with Large Language Model as Agent Paper • 2403.10517 • Published Mar 15, 2024 • 37
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer Paper • 2403.10301 • Published Mar 15, 2024 • 54
LocalMamba: Visual State Space Model with Windowed Selective Scan Paper • 2403.09338 • Published Mar 14, 2024 • 9
Veagle: Advancements in Multimodal Representation Learning Paper • 2403.08773 • Published Jan 18, 2024 • 10
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding Paper • 2403.09626 • Published Mar 14, 2024 • 16
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding Paper • 2403.09530 • Published Mar 14, 2024 • 10
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 11
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences Paper • 2403.09347 • Published Mar 14, 2024 • 22
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering Paper • 2403.09622 • Published Mar 14, 2024 • 18