Development Plan #166
Replies: 19 comments 18 replies
-
Thank you for your team's efforts, will Fooocus continue to be updated? It seems like you have quite a few projects currently. |
Beta Was this translation helpful? Give feedback.
-
By the way, there are some wrong tests on the internet that run original webui and forge at the same time, and say their speeds are similar. This is wrong because if you open original webui and forge at the same time, the origin will just use all possible GPU, but the forge will detect that, automatically change to a slower method using less VRAM, so that Forge can help you to generate images without OOM on both software. (Because if forge use normal speed with more vram, the origin will OOM) To see the speed up, you need to at least close original webui, so that forge can use your GPU normally. |
Beta Was this translation helpful? Give feedback.
-
Superb work! |
Beta Was this translation helpful? Give feedback.
-
Whilst the work done is amazing, it should be made really clear on the project's readme that the test devices for this project do not include AMD cards, and that currently DirectML support is non existent. This repo has been suggested in multiple places as, at least, an option for AMD owners but it is visible such a thing shouldn't have been done. |
Beta Was this translation helpful? Give feedback.
-
So either my vanilla AUTO1111 conifg is optimal, or my Forge one is misconfigured, wonder which one it is :) I can get 25% higher resolution before OOM, so that's good, but I wish the speed was more apparent. |
Beta Was this translation helpful? Give feedback.
-
This is so SO good for people with 8gb of VRAM (3070ti in my case). On 1111 with medvram even YouTube playback lags when generating. Forge allowed me to UP the resolution that I'm generating at AND watch YouTube/local Video normally. Thank you! |
Beta Was this translation helpful? Give feedback.
-
Currently testing controlnet IPadapter faceID with my ancient gpu GTX 970 4GB and the result is amazing, With OG A1111 it takes 3-4 minutes to generate 1 image. With forge it takes only 20-30 seconds. |
Beta Was this translation helpful? Give feedback.
-
@lllyasviel Any consideration for a branch, perhaps by default, pulling from Main rather than Dev? I have had few, if any, problems with Dev - but certainly seems unintended to be a base for projects and could potentially lead to breakage. That said, it'd be nice if Dev moved over to Main a little more routinely. Can't say why it's been 2+ months since the last release when Dev has great features and feels stable, except that making mainline releases certainly takes time and energy. |
Beta Was this translation helpful? Give feedback.
-
Is there a Discord or something for discussing Forge and troubleshooting? |
Beta Was this translation helpful? Give feedback.
-
I have tested with these "--directml --skip-torch-cuda-test --always-normal-vram --skip-version-check" commandline args, and It's the same speed if not slower than A1111. But I have succesfully created 1920x1080(with upscale or without) images around 5 mins. I could't use higher res in A1111 due to out of memory issues. I have AMD RX 6700 XT with 12GB. Also I can't preview the image while processing and I don't know it is intended or not. Thanks. |
Beta Was this translation helpful? Give feedback.
-
no infos about rtx4080 ? |
Beta Was this translation helpful? Give feedback.
-
really amazing work |
Beta Was this translation helpful? Give feedback.
-
Does Forge support Cascade? |
Beta Was this translation helpful? Give feedback.
-
This has been huge for me! I'm shouting praises about forge whenever possible. Thank you for making my dinky lil' 2070 Super a whole lot more useful! Thank you also for all your hard work on everything else you do. Much appreciated. |
Beta Was this translation helpful? Give feedback.
-
What exactly is forge in a1111's extension form?
well some extensions are already incompatible between forge and a1111, aren't that enough to make 2 client compete each other? |
Beta Was this translation helpful? Give feedback.
-
what about Stable Cascade and Wurstchen any plans for that because around the internet people are asking about it like since there are some models for it already why isnt there a webui or a port for it like merge Forge or something like it with Cascade+Würstchen and the upcoming SD3 its just something people are asking about at times theres Also This thats Coming https://ella-diffusion.github.io/ from what people have tested its 26times better than Clip and Open Clip combined and its already out For comfy UI https://github.com/TencentQQGYLab/ELLA https://github.com/Stability-AI/StableCascade https://stability.ai/news/introducing-stable-cascade https://huggingface.co/blog/wuerstchen |
Beta Was this translation helpful? Give feedback.
-
我的3070 8G,用forge最好用的就是不爆显存,真的支持! |
Beta Was this translation helpful? Give feedback.
-
I don't think so Forge will support those new models any time soon. ComfyUI, and SDNext support most of the models, StableSwarm still in beta but supports StableCascade, Playground 2.5, SD3. Automatic1111 does not support any of those new models too. |
Beta Was this translation helpful? Give feedback.
-
Forge is a platform on top of Stable-Diffusion-WebUI to make speed faster and make development easier. We will get all updates from the dev branch of original webui automatically with bots, and we do not have any motivation or plan to compete with original webui.
Another reason is that we have several ongoing research projects planned, and we want to use this very friendly webui that everyone love, but we really do not want users to be disappointed by the speed and performance, especially when several our future works will be based on SDXL.
We promise that, when the following (1) and (2) happen together, this repo will immediately change to an extension of standard Stable-Diffusion-WebUI from Automatic1111, that is to say, immediately join back to the original sd-webui ecosystem of Automatic1111 when the following 2 things happen together:
On at least 4 of our 5 test devices (RTX 2060, RTX 3060 laptop, RTX 3090, RTX 4090, RTX 3070ti laptop), if the original webui is equally fast or at most 10% slower (Except RTX 3090 and 4090 since the speed up for them are around 5%). All CMD flags are accepted, but we exclude tech that sacrifices functionality like TensorRT or torch compile. Note that this includes a full generation pass of CLIP time, diffusion time, model moving time, and VAE time. We use SDXL 1024x1024 at batch size 1 and 4, and with batch count 1 and 16.
On at least 4 of our 5 test devices (RTX 2060, RTX 3060 laptop, RTX 3090, RTX 4090, RTX 3070ti laptop), if the original webui is equally memory efficient or at most use 512MB more VRAM, and the diffusion image resolution without OOM is at least 90% of Forge’s max resolution (On RTX 3060 laptop and RTX 3070ti laptop). All CMD flags are accepted.
If the above two things happen together, we will immediately move back to original sd-webui ecosystem, and this repo will be made into an extension. Do not worry about engineering complexity - we have excellent team that can solve every problem.
We will test every 15 days with original webui dev branch, or anyone can inform us to test, whenever by replying to this post.
Before that, all major updates from us will happen here and we will wait upstream to resolve memory/speed issues. The tool set of Forge after 0.0.10 is relatively complete now. Users should not have big problems in image processing tasks.
Note that Forge backend API will not be modified even if we change to an extension.
lllyasviel
Beta Was this translation helpful? Give feedback.
All reactions