Neon: Negative Extrapolation From Self-Training Improves Image Generation
This repository contains the official implementation for the paper Neon: Negative Extrapolation From Self-Training Improves Image Generation.
Abstract
Scaling generative AI models is bottlenecked by the scarcity of high-quality training data. The ease of synthesizing from a generative model suggests using (unverified) synthetic data to augment a limited corpus of real data for the purpose of fine-tuning in the hope of improving performance. Unfortunately, however, the resulting positive feedback loop leads to model autophagy disorder (MAD, aka model collapse) that results in a rapid degradation in sample quality and/or diversity. In this paper, we introduce Neon (for Negative Extrapolation frOm self-traiNing), a new learning method that turns the degradation from self-training into a powerful signal for self-improvement. Given a base model, Neon first fine-tunes it on its own self-synthesized data but then, counterintuitively, reverses its gradient updates to extrapolate away from the degraded weights. We prove that Neon works because typical inference samplers that favor high-probability regions create a predictable anti-alignment between the synthetic and real data population gradients, which negative extrapolation corrects to better align the model with the true data distribution. Neon is remarkably easy to implement via a simple post-hoc merge that requires no new real data, works effectively with as few as 1k synthetic samples, and typically uses less than 1% additional training compute. We demonstrate Neon's universality across a range of architectures (diffusion, flow matching, autoregressive, and inductive moment matching models) and datasets (ImageNet, CIFAR-10, and FFHQ). In particular, on ImageNet 256x256, Neon elevates the xAR-L model to a new state-of-the-art FID of 1.02 with only 0.36% additional training compute.
Official Resources
- Paper: https://huggingface.co/papers/2510.03597
- GitHub Repository: https://github.com/VITA-Group/Neon
Method
In one line: sample with your usual inference to form a synthetic set $S$; briefly fine-tune the reference model on $S$ to get $\theta_s$; then reverse that update with a merge $\theta_{\text{neon}}=(1+w),\theta_r - w,\theta_s$ (small $w>0$), which cancels mode-seeking drift and improves FID.
Benchmark Performance
| Model type | Dataset | Base model FID | Neon FID (paper) | Download model |
|---|---|---|---|---|
| xAR-L | ImageNet-256 | 1.28 | 1.02 | Download |
| xAR-B | ImageNet-256 | 1.72 | 1.31 | Download |
| VAR d16 | ImageNet-256 | 3.30 | 2.01 | Download |
| VAR d36 | ImageNet-512 | 2.63 | 1.70 | Download |
| EDM (cond.) | CIFAR-10 (32Γ32) | 1.78 | 1.38 | Download |
| EDM (uncond.) | CIFAR-10 (32Γ32) | 1.98 | 1.38 | Download |
| EDM | FFHQ-64Γ64 | 2.39 | 1.12 | Download |
| IMM | ImageNet-256 | 1.99 | 1.46 | Download |
Quickstart & Evaluation
For environment setup, downloading pretrained models, and evaluation scripts (for FID/IS), please refer to the GitHub repository's Quickstart section.
Repository Map
Neon/
βββ VAR/ # VAR baselines + eval scripts
βββ xAR/ # xAR baselines + eval scripts (uses MAR VAE)
βββ edm/ # EDM baselines + metrics/scripts
βββ imm/ # IMM baselines + eval scripts
βββ toy_appendix.ipynb # 2D Gaussian toy example (diffusion & AR)
βββ download_models.sh # Grab all checkpoints + FID refs
βββ environment.yml # Reproducible env
βββ checkpoints/, fid_stats/ (created by the script)
Citation
@article{alemohammadneon2025,
title = {Neon: Negative Extrapolation From Self-Training Improves Image Generation},
author = {Alemohammad, Sina and Wang, Zhangyang and Baraniuk, Richard G.},
journal = {arXiv preprint arXiv:2510.03597},
year = {2025},
url = {https://arxiv.org/abs/2510.03597}
}
Contact
Questions? Reach out to Sina Alemohammad β [email protected].
Acknowledgments
This repository builds upon and thanks the following projects:
