sanaka87 (Ji Xie)

reacted to their post with 🔥 2 months ago

Post

3165

Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! 🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!

📄 Paper: https://alphaxiv.org/abs/2509.07295
💻 Code: https://github.com/HorizonWind2004/reconstruction-alignment
🤗 HF Models: sanaka87/reca-68ad2176380355a3dcedc068
✍️ DEMO: sanaka87/BAGEL-RecA
🌐 Project Page: https://reconstruction-alignment.github.io
🔥 X: https://x.com/XDWang101/status/1965908302581420204
📰 Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
🤗 HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)

⚡ <10k images & 27 GPU hours (no-arch-changes) → SOTA, surpassing much larger open-source & private models:

📊 GenEval: 0.73 → 0.90 | 📊 DPGBench: 80.93 → 88.15
🖼️ ImgEdit: 3.38 → 3.75 | 🖌️ GEdit: 6.94 → 7.25

✅ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings → big gains in image generation 🎨 & editing ✂️.

posted an update 2 months ago

Post

3165

Excited to share our Unified Multimodal Models new work Reconstruction Alignment (RecA)! 🚀 Just 6 × 80GB A100s × 4.5 hours to boost BAGEL performance across all tasks! Outperforms FLUX-Kontext in image editing capabilities!

📄 Paper: https://alphaxiv.org/abs/2509.07295
💻 Code: https://github.com/HorizonWind2004/reconstruction-alignment
🤗 HF Models: sanaka87/reca-68ad2176380355a3dcedc068
✍️ DEMO: sanaka87/BAGEL-RecA
🌐 Project Page: https://reconstruction-alignment.github.io
🔥 X: https://x.com/XDWang101/status/1965908302581420204
📰 Zhihu: https://zhuanlan.zhihu.com/p/1947584568187159814
🤗 HF Daily Paper: Reconstruction Alignment Improves Unified Multimodal Models (2509.07295)

⚡ <10k images & 27 GPU hours (no-arch-changes) → SOTA, surpassing much larger open-source & private models:

📊 GenEval: 0.73 → 0.90 | 📊 DPGBench: 80.93 → 88.15
🖼️ ImgEdit: 3.38 → 3.75 | 🖌️ GEdit: 6.94 → 7.25

✅ RecA trains UMMs to reconstruct images from their own visual understanding encoder embeddings → big gains in image generation 🎨 & editing ✂️.

posted an update 7 months ago

Post

3121

Our ICEdit's video is below~
🔥 🔥🔥Huggingface DEMO: RiverZ/ICEdit
🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface: sanaka87/ICEdit-MoE-LoRA
📄 arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

reacted to RiverZ's post with 🔥🤗 7 months ago

Post

3657

🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer～

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO:
RiverZ/ICEdit

🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface:
sanaka87/ICEdit-MoE-LoRA

📄 arxiv Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!

replied to their post 7 months ago

reacted to their post with 👍🔥 7 months ago

Post

2765

🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer～

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO: RiverZ/ICEdit
🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface: sanaka87/ICEdit-MoE-LoRA
📄 arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!

1 reply

·

posted an update 7 months ago

Post

2765

🚀 Excited to Share Our Latest Work: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer～

🎨 Daily Paper:
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔓 Code is now open source!
🔥 Huggingface DEMO: RiverZ/ICEdit
🌐 Project Website: https://river-zhang.github.io/ICEdit-gh-pages/
🏠 GitHub Repository: https://github.com/River-Zhang/ICEdit/blob/main/scripts/gradio_demo.py
🤗 Huggingface: sanaka87/ICEdit-MoE-LoRA
📄 arxiv Paper: In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer (2504.20690)

🔥 Why it’s cool:
- Achieves high-quality, multi-task image editing.
- Uses only 1% of the training parameters and 0.1% of the training data compared to existing methods — extremely efficient
- Beats several commercial models on background preservation, ID control, and consistency
- Open-source, low-cost, faster, and stronger — think of it as the “DeepSeek of image editing” 👀

We also implemented a Gradio demo app, available directly in our GitHub repo! And we made a flashy demo video — happy to send it your way!

1 reply

·

reacted to m-ric's post with 👀 10 months ago

Post

1448

𝗠𝗶𝗻𝗶𝗠𝗮𝘅'𝘀 𝗻𝗲𝘄 𝗠𝗼𝗘 𝗟𝗟𝗠 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝗖𝗹𝗮𝘂𝗱𝗲-𝗦𝗼𝗻𝗻𝗲𝘁 𝗹𝗲𝘃𝗲𝗹 𝘄𝗶𝘁𝗵 𝟰𝗠 𝘁𝗼𝗸𝗲𝗻𝘀 𝗰𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵 💥

This work from Chinese startup @MiniMax-AI introduces a novel architecture that achieves state-of-the-art performance while handling context windows up to 4 million tokens - roughly 20x longer than current models. The key was combining lightning attention, mixture of experts (MoE), and a careful hybrid approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀:

🏗️ MoE with novel hybrid attention:
‣ Mixture of Experts with 456B total parameters (45.9B activated per token)
‣ Combines Lightning attention (linear complexity) for most layers and traditional softmax attention every 8 layers

🏆 Outperforms leading models across benchmarks while offering vastly longer context:
‣ Competitive with GPT-4/Claude-3.5-Sonnet on most tasks
‣ Can efficiently handle 4M token contexts (vs 256K for most other LLMs)

🔬 Technical innovations enable efficient scaling:
‣ Novel expert parallel and tensor parallel strategies cut communication overhead in half
‣ Improved linear attention sequence parallelism, multi-level padding and other optimizations achieve 75% GPU utilization (that's really high, generally utilization is around 50%)

🎯 Thorough training strategy:
‣ Careful data curation and quality control by using a smaller preliminary version of their LLM as a judge!

Overall, not only is the model impressive, but the technical paper is also really interesting! 📝
It has lots of insights including a great comparison showing how a 2B MoE (24B total) far outperforms a 7B model for the same amount of FLOPs.

Read it in full here 👉 MiniMax-01: Scaling Foundation Models with Lightning Attention (2501.08313)
Model here, allows commercial use <100M monthly users 👉 MiniMaxAI/MiniMax-Text-01

posted an update 10 months ago

Post

1823

🚀 Excited to Share Our Latest Work: 3DIS & 3DIS-FLUX for Multi-Instance Layout-to-Image Generation! ❤️❤️❤️

🎨 Daily Paper: 3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering (2501.05131)
🔓 Code is now open source!
🌐 Project Website: https://limuloo.github.io/3DIS/
🏠 GitHub Repository: https://github.com/limuloo/3DIS
📄 3DIS Paper: https://arxiv.org/abs/2410.12669
📄 3DIS-FLUX Tech Report: https://arxiv.org/abs/2501.05131

🔥 Why 3DIS & 3DIS-FLUX?
Current SOTA multi-instance generation methods are typically adapter-based, requiring additional control modules trained on pre-trained models for layout and instance attribute control. However, with the emergence of more powerful models like FLUX and SD3.5, these methods demand constant retraining and extensive resources.

✨ Our Solution: 3DIS
We introduce a decoupled approach that only requires training a low-resolution Layout-to-Depth model to convert layouts into coarse-grained scene depth maps. Leveraging community and company pre-trained models like ControlNet + SAM2, we enable training-free controllable image generation on high-resolution models such as SDXL and FLUX.

🌟 Benefits of Our Decoupled Multi-Instance Generation:
1. Enhanced Control: By constructing scenes using depth maps in the first stage, the model focuses on coarse-grained scene layout, improving control over instance placement.
2. Flexibility & Preservation: The second stage employs training-free rendering methods, allowing seamless integration with various models (e.g., fine-tuned weights, LoRA) while maintaining the generative capabilities of pre-trained models.

Join us in advancing Layout-to-Image Generation! Follow and star our repository to stay updated! ⭐

replied to Flowerfan's post over 1 year ago

XD

reacted to Flowerfan's post with 🤝❤️🤗👍 over 1 year ago

Post

Multi-Instance Generation Controller: Enjoy complete control over position generation, attribute determination, and count!

code link: https://github.com/limuloo/MIGC
project page: https://migcproject.github.io/

MIGC decouples multi-instance generation into individual single-instance generation subtasks within the cross-attention layer of Stable Diffusion.

Welcome to follow our project and use the code to create anything you imagine!

Please let us know if you have any suggestions!

6 replies

·

Ji Xie PRO

AI & ML interests

Recent Activity

Organizations

Ji Xie PRO

AI & ML interests

Recent Activity

Organizations

sanaka87's activity