Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
sanaka87Β 
posted an update Sep 10
Post
3103
Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! πŸš€ Just 6 Γ— 80GB A100s Γ— 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!

πŸ“„ Paper: https://alphaxiv.org/abs/2509.07295
πŸ’» Code: https://github.com/HorizonWind2004/reconstruction-alignment
πŸ€— HF Models: sanaka87/reca-68ad2176380355a3dcedc068
✍️ DEMO: sanaka87/BAGEL-RecA
🌐 Project Page: https://reconstruction-alignment.github.io
πŸ”₯ X: https://x.com/XDWang101/status/1965908302581420204
πŸ“° Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
πŸ€— HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)

⚑ <10k images & 27 GPU hours (no-arch-changes) β†’ SOTA, surpassing much larger open-source & private models:

πŸ“Š GenEval: 0.73 β†’ 0.90 | πŸ“Š DPGBench: 80.93 β†’ 88.15
πŸ–ΌοΈ ImgEdit: 3.38 β†’ 3.75 | πŸ–ŒοΈ GEdit: 6.94 β†’ 7.25

βœ… RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings β†’ big gains in image generation 🎨 & editing βœ‚οΈ.
In this post