Whatcha cookin?

by WhiteZero - opened May 22

Discussion

WhiteZero

May 22

Have you talked about this one at all? Sounds interesting.

SG161222

Owner May 25

Hi! This model is based on Flex.1-alpha for generating realistic images. It’s a pretty solid training foundation. I had to modify the sd-scripts to support EMA during fine-tuning, which should positively impact the model’s quality.

Right now, the fine-tuning speed is about 4.4 s/iteration with EMA at 768-pixel resolution on an RTX 4090. I ran numerous tests to find the optimal parameters, found them, and have now begun the main fine-tuning. I plan to share results at each training milestone on the model’s page.

Fine-tuning SD3.5 Medium wasn’t successful, so I’ve switched to Flex.1-alpha.

Green-Sky

Jul 27

How is it going? I see it is attempt #7 now...

SG161222

Owner Jul 30

Hi! At the moment, the model is being improved. The improvement process might take about 10 days. The improved version looks far better than the base model. Further trainings will likely take less time. I'll update the model statistics a bit later.

Qbsoon

Sep 9

Hi, just checking in, how is the progress going? (Honestly I'm checking the project page like few times a week, because I love Verus Vision and I wonder what will come out of this model)

SG161222

Owner Sep 17

Hi! The model isn't abandoned. The last trained model, which was almost 75% complete, didn't perform much better than the baseline. "Plastic skin" was still present. I've also corrected a few of my fine-tuning settings errors, which should make the model more stable and realistic. Today, I'll be testing the new settings with a higher LR, and then I'll start a full fine-tune on the complete dataset.

SG161222

Owner Sep 30

Hello everyone! It seems that the model degrades during long training sessions in Kohya-ss, especially when using resume_from_state. I conducted many tests on small datasets, up to 100 images, and both LoRA and Fine-Tune yielded good results. However, the longer the training progresses and the larger the dataset becomes, the more degradation occurs. Moreover, the model officially does not have support in Kohya-ss.

Therefore, there are two options: Chroma or Qwen Image. Both models support fine-tuning in OneTrainer on a 16-24GB graphics card. I think I should start with Qwen Image.

WhiteZero

Sep 30

I would have voted for Chroma because it's inherently uncensored.. But it'll be interesting to see what you can do with Qwen!

Qbsoon

Sep 30

Thanks for the update. Qwen Image is interesting, looking forward to what you can get out of it

dev-developski

Sep 30

Qwen Image is great ;)

BastianAI

Oct 5

Thanks for the update. Both would be good tbh, but Qwen is probably the stronger alternative of the two. Excited to see what you come up with!

Not-your-Tim

Oct 9

I see Chroma, I follow, good luck!

Green-Sky

Oct 10

Chroma will likely be better, but Qwen Image has more capacity.

SG161222

Owner 24 days ago

Hello everyone! I have some news about the first version of SPARK. The model is almost ready, and tests show good improvements in some cases, but not what I wanted. This means I should change some training parameters to reach my goals. Training of the second version will start a few days after the release of the first version.

qpqpqpqpqpqp

5 days ago

@dev-developski
https://huggingface.co/SG161222/SPARK.Qwen_v1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment