Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.05741

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 77

Image Generation

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Paper • 2506.07977 • Published Jun 9 • 41
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9 • 19
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6 • 23
Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5 • 27

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Paper • 2504.14032 • Published Apr 18 • 7
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 158
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 120

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

DDT: Decoupled Diffusion Transformer

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 77
Sleeping

5

DDT

😻

5

Decoupled Diffusion Transformer
MCG-NJU/DDT-XL-22en6de-R256

Text-to-Image • Updated Apr 11 • 4
MCG-NJU/DDT-XL-22en6de-R512

Updated Apr 11 • 4 • 1

Diffusion transformer

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

Paper • 2502.04847 • Published Feb 7 • 1
DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 77

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 77
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

Paper • 2504.14509 • Published Apr 20 • 50
Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8 • 86
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Paper • 2508.04825 • Published Aug 6 • 58

Architectural Proposals

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published Dec 16, 2024 • 23
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 58

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 39
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 117

Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis

Paper • 2401.09048 • Published Jan 17, 2024 • 10
Improving fine-grained understanding in image-text pre-training

Paper • 2401.09865 • Published Jan 18, 2024 • 18
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Paper • 2401.10891 • Published Jan 19, 2024 • 62
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 77

DDT: Decoupled Diffusion Transformer

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 77
Sleeping

5

DDT

😻

5

Decoupled Diffusion Transformer
MCG-NJU/DDT-XL-22en6de-R256

Text-to-Image • Updated Apr 11 • 4
MCG-NJU/DDT-XL-22en6de-R512

Updated Apr 11 • 4 • 1

Image Generation

OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation

Paper • 2506.07977 • Published Jun 9 • 41
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

Paper • 2506.07986 • Published Jun 9 • 19
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published Jun 6 • 23
Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5 • 27

Diffusion transformer

HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation

Paper • 2502.04847 • Published Feb 7 • 1
DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 77

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Paper • 2504.16064 • Published Apr 22 • 14
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models

Paper • 2504.14032 • Published Apr 18 • 7
Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 158
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 120

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published Apr 8 • 77
DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning

Paper • 2504.14509 • Published Apr 20 • 50
Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8 • 86
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off

Paper • 2508.04825 • Published Aug 6 • 58

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

Architectural Proposals

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published Dec 16, 2024 • 23
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89
TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published Feb 11 • 58

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 103
An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51
Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion

Paper • 2406.04338 • Published Jun 6, 2024 • 39
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1, 2024 • 117

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs