--- license: mit --- # Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion ## ๐ฉ Overview
(a) Overview of Semantic-First Diffusion (SFD). Semantics (dashed curve) and textures (solid curve) follow asynchronous denoising trajectories. SFD operates in three phases: Stage I โ Semantic initialization, where semantic latents denoise first; Stage II โ Asynchronous generation, where semantics and textures denoise jointly but asynchronously, with semantics ahead of textures; Stage III โ Texture completion, where only textures continue refining. After denoising, the generated semantic latent sโ is discarded, and the final image is decoded solely from the texture latent zโ. (b) Training convergence on ImageNet 256ร256 without guidance. SFD achieves substantially faster convergence than DiT-XL/2 and LightningDiT-XL/1 by approximately 100ร and 33.3ร, respectively.