-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2507.13344
-
FlashWorld: High-quality 3D Scene Generation within Seconds
Paper • 2510.13678 • Published • 70 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Paper • 2509.18090 • Published • 4 -
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Paper • 2509.19296 • Published • 23
-
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 133 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 86 -
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 79
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
-
Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations
Paper • 2504.09953 • Published • 1 -
Deformable 3D Gaussian Splatting for Animatable Human Avatars
Paper • 2312.15059 • Published • 1 -
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
Paper • 2501.09782 • Published -
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57
-
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 24 -
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
Paper • 2507.08776 • Published • 54
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
TextureDreamer: Image-guided Texture Synthesis through Geometry-aware Diffusion
Paper • 2401.09416 • Published • 11 -
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
Paper • 2401.10171 • Published • 14 -
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model
Paper • 2311.09217 • Published • 22 -
GALA: Generating Animatable Layered Assets from a Single Scan
Paper • 2401.12979 • Published • 9
-
FlashWorld: High-quality 3D Scene Generation within Seconds
Paper • 2510.13678 • Published • 70 -
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks
Paper • 2510.15019 • Published • 63 -
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Paper • 2509.18090 • Published • 4 -
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
Paper • 2509.19296 • Published • 23
-
Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations
Paper • 2504.09953 • Published • 1 -
Deformable 3D Gaussian Splatting for Animatable Human Avatars
Paper • 2312.15059 • Published • 1 -
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation
Paper • 2501.09782 • Published -
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57
-
HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels
Paper • 2507.21809 • Published • 133 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 86 -
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 79
-
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Paper • 2507.13344 • Published • 57 -
π^3: Scalable Permutation-Equivariant Visual Geometry Learning
Paper • 2507.13347 • Published • 64 -
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second
Paper • 2507.10065 • Published • 24 -
CLiFT: Compressive Light-Field Tokens for Compute-Efficient and Adaptive Neural Rendering
Paper • 2507.08776 • Published • 54
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 16 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published • 1 -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 158
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47