Tested with Sageattention?

#101
by AlexTheSandGrinder - opened

I'm getting black outputs using sageattention (only with this model - nunchaku counterpart works fine). Have you tried it with sage or sage+triton?
Works absolutely fine otherwise. However, I would like the speed boost if possible, obviously!

Edit: I should mention I'm using GGUF Q6. So it might be that. Or is it so also with your base version?
(https://huggingface.co/Phil2Sat/Qwen-Image-Edit-Rapid-AIO-GGUF/tree/main/v50)

Edit

More like not their quants

I have seen some posts mentioning that the new ComfyUI update broke some things, I can't use my tester Qwen IMage Edit 2509 fp8 template, all my generations have nothing but noise now. I had expected Phr00t's checkpoint to fail as well, but its generating fine so not sure what actually broke.

Not sure what that will do since Phr00t's workflow never stopped working even after comfyui updated. Its with my default 2509 workflow with fp8, that no longer generates correctly.

The lesson we have learned is, my stuff never breaks! /s

@phr00t don't put that out in the world, jinxing is a real thing :)

The lesson we have learned is, my stuff never breaks! /s

Darn it others hijacked the thread! No comment on my predicament with Sage, Fruit? You ever use sage with it? I haven't updated comfy in a while, so I don't even know what these other fellas are talking about!

I use sage attention just fine, but I do not use GGUFs.

I'm getting black outputs using sageattention (only with this model - nunchaku counterpart works fine). Have you tried it with sage or sage+triton?
Works absolutely fine otherwise. However, I would like the speed boost if possible, obviously!

Edit: I should mention I'm using GGUF Q6. So it might be that. Or is it so also with your base version?
(https://huggingface.co/Phil2Sat/Qwen-Image-Edit-Rapid-AIO-GGUF/tree/main/v50)

I only use sage with WAN generations, I have found the workflow here from Phr00t is really fast, seeing as how he baked in Lightning lora, even at 6-8 steps, I am generating under a minute (I have an RTX 4070 so your mileage may vary. Also to note, if no one else mentioned it previously, the FIRST time you kick off a generation, it takes a little longer to load things into RAM (and offload to swap CPU if necessary), after that if you are just generating a bunch of renders from the same settings/prompt it goes much faster. If you change the prompt at all, it will reload everything into RAM/SWAP, but if you just change loras, or play with their weights, it does not have to reload.

Sign up or log in to comment