@sanaka87 on Hugging Face: "Excited to share our Unified Multimodal Models new work Reconstruction…"

Post

3103

Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! 🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!

📄 Paper: https://alphaxiv.org/abs/2509.07295
💻 Code: https://github.com/HorizonWind2004/reconstruction-alignment
🤗 HF Models: sanaka87/reca-68ad2176380355a3dcedc068
✍️ DEMO: sanaka87/BAGEL-RecA
🌐 Project Page: https://reconstruction-alignment.github.io
🔥 X: https://x.com/XDWang101/status/1965908302581420204
📰 Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
🤗 HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)

⚡ <10k images & 27 GPU hours (no-arch-changes) → SOTA, surpassing much larger open-source & private models:

📊 GenEval: 0.73 → 0.90 | 📊 DPGBench: 80.93 → 88.15
🖼️ ImgEdit: 3.38 → 3.75 | 🖌️ GEdit: 6.94 → 7.25

✅ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings → big gains in image generation 🎨 & editing ✂️.