Whatcha cookin?
Have you talked about this one at all? Sounds interesting.
Hi! This model is based on Flex.1-alpha for generating realistic images. It’s a pretty solid training foundation. I had to modify the sd-scripts to support EMA during fine-tuning, which should positively impact the model’s quality.
Right now, the fine-tuning speed is about 4.4 s/iteration with EMA at 768-pixel resolution on an RTX 4090. I ran numerous tests to find the optimal parameters, found them, and have now begun the main fine-tuning. I plan to share results at each training milestone on the model’s page.
Fine-tuning SD3.5 Medium wasn’t successful, so I’ve switched to Flex.1-alpha.
How is it going? I see it is attempt #7 now...
Hi! At the moment, the model is being improved. The improvement process might take about 10 days. The improved version looks far better than the base model. Further trainings will likely take less time. I'll update the model statistics a bit later.
Hi, just checking in, how is the progress going? (Honestly I'm checking the project page like few times a week, because I love Verus Vision and I wonder what will come out of this model)
Hi! The model isn't abandoned. The last trained model, which was almost 75% complete, didn't perform much better than the baseline. "Plastic skin" was still present. I've also corrected a few of my fine-tuning settings errors, which should make the model more stable and realistic. Today, I'll be testing the new settings with a higher LR, and then I'll start a full fine-tune on the complete dataset.
Hello everyone! It seems that the model degrades during long training sessions in Kohya-ss, especially when using resume_from_state. I conducted many tests on small datasets, up to 100 images, and both LoRA and Fine-Tune yielded good results. However, the longer the training progresses and the larger the dataset becomes, the more degradation occurs. Moreover, the model officially does not have support in Kohya-ss.
Therefore, there are two options: Chroma or Qwen Image. Both models support fine-tuning in OneTrainer on a 16-24GB graphics card. I think I should start with Qwen Image.
I would have voted for Chroma because it's inherently uncensored.. But it'll be interesting to see what you can do with Qwen!
Thanks for the update. Qwen Image is interesting, looking forward to what you can get out of it
Qwen Image is great ;)
Thanks for the update. Both would be good tbh, but Qwen is probably the stronger alternative of the two. Excited to see what you come up with!
I see Chroma, I follow, good luck!
Chroma will likely be better, but Qwen Image has more capacity.
Hello everyone! I have some news about the first version of SPARK. The model is almost ready, and tests show good improvements in some cases, but not what I wanted. This means I should change some training parameters to reach my goals. Training of the second version will start a few days after the release of the first version.