ChenYuWei's picture

52 20

ChenYuWei

Yvonnnne

·

AI & ML interests

None yet

Organizations

None yet

upvoted a paper 5 months ago

VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos

Paper • 2505.23693 • Published May 29 • 55

upvoted 19 papers 8 months ago

SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models

Paper • 2503.07605 • Published Mar 10 • 68

NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices

Paper • 2403.10425 • Published Mar 15, 2024 • 4

Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Paper • 2403.09981 • Published Mar 15, 2024 • 8

FDGaussian: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model

Paper • 2403.10242 • Published Mar 15, 2024 • 12

EfficientVMamba: Atrous Selective Scan for Light Weight Visual Mamba

Paper • 2403.09977 • Published Mar 15, 2024 • 11

Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding

Paper • 2403.10395 • Published Mar 15, 2024 • 9

MusicHiFi: Fast High-Fidelity Stereo Vocoding

Paper • 2403.10493 • Published Mar 15, 2024 • 19

Recurrent Drafter for Fast Speculative Decoding in Large Language Models

Paper • 2403.09919 • Published Mar 14, 2024 • 22

Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations

Paper • 2403.09704 • Published Mar 8, 2024 • 33

RAFT: Adapting Language Model to Domain Specific RAG

Paper • 2403.10131 • Published Mar 15, 2024 • 72

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Paper • 2403.10517 • Published Mar 15, 2024 • 37

Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

Paper • 2403.10301 • Published Mar 15, 2024 • 54

LocalMamba: Visual State Space Model with Windowed Selective Scan

Paper • 2403.09338 • Published Mar 14, 2024 • 9

Veagle: Advancements in Multimodal Representation Learning

Paper • 2403.08773 • Published Jan 18, 2024 • 10

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Paper • 2403.09626 • Published Mar 14, 2024 • 16

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

Paper • 2403.09530 • Published Mar 14, 2024 • 10

3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 11

BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences

Paper • 2403.09347 • Published Mar 14, 2024 • 22

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Paper • 2403.09622 • Published Mar 14, 2024 • 18