Post
3165
Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! ๐ Just 6 ร 80GB A100s ร 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!
๐ Paper: https://alphaxiv.org/abs/2509.07295
๐ป Code: https://github.com/HorizonWind2004/reconstruction-alignment
๐ค HF Models: sanaka87/reca-68ad2176380355a3dcedc068
โ๏ธ DEMO: sanaka87/BAGEL-RecA
๐ Project Page: https://reconstruction-alignment.github.io
๐ฅ X: https://x.com/XDWang101/status/1965908302581420204
๐ฐ Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
๐ค HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)
โก <10k images & 27 GPU hours (no-arch-changes) โ SOTA, surpassing much larger open-source & private models:
๐ GenEval: 0.73 โ 0.90 | ๐ DPGBench: 80.93 โ 88.15
๐ผ๏ธ ImgEdit: 3.38 โ 3.75 | ๐๏ธ GEdit: 6.94 โ 7.25
โ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings โ big gains in image generation ๐จ & editing โ๏ธ.
๐ Paper: https://alphaxiv.org/abs/2509.07295
๐ป Code: https://github.com/HorizonWind2004/reconstruction-alignment
๐ค HF Models: sanaka87/reca-68ad2176380355a3dcedc068
โ๏ธ DEMO: sanaka87/BAGEL-RecA
๐ Project Page: https://reconstruction-alignment.github.io
๐ฅ X: https://x.com/XDWang101/status/1965908302581420204
๐ฐ Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
๐ค HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)
โก <10k images & 27 GPU hours (no-arch-changes) โ SOTA, surpassing much larger open-source & private models:
๐ GenEval: 0.73 โ 0.90 | ๐ DPGBench: 80.93 โ 88.15
๐ผ๏ธ ImgEdit: 3.38 โ 3.75 | ๐๏ธ GEdit: 6.94 โ 7.25
โ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings โ big gains in image generation ๐จ & editing โ๏ธ.
