-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat]: Long Clip support #624
Comments
As I have it working locally, but in not upstreamable way I'll write down what I figured out along the way. Files of LongClip from this repo by default comes as whole ClipModel, OneTrainer by default use ClipTextModel (equivalent to ClipModel.text_model). If my commit pass https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14/commit/59dd3e4d98acf93ef5093091981fe447e947ae1c it will be easier to differentiate between Clip and LongClip just from config.json and set proper max_length in modules/model for models using Clip-L. For now pipeline won't run without changing config or setting somewhere
or
depending on implementation. I have no idea how to differentiate between LongClip and Clip when using single file safetensor instead of diffusers format. |
You can download longClip in form that should work out of the box using this python code:
Just download SD 1.5 or Flux model in diffusers format and overwrite models text_encoder and tokenizer directories with ones saved by script. Than use branch from link below to have whole 248 token limit support. https://github.com/Heasterian/OneTrainer/blob/LongClip/ I do not have Flux downloaded and tested, so let me know if it works as it should as implementation is a little bit different due to two different types of encoders. |
I used Comfy to replace Clip to LongClip for one of my models. The combined checkpoint was successfully saved and then loaded, but I got an error trying to render images with it.
|
Well, you are loading model from single file, not diffusers format I mentioned. With safetensors code is falling back to 77 tokens as config does not include info about max position embeddings. Does comfy save .yaml file along safetensors? If yes, send it here. |
Yeah, I know the word "diffusers" but I'm not sure I'm able to work with it |
You can convert model to diffusers format using tool from tools tab in Onetrainer. |
Just overwrite text_encoder and tokenizer in resulting directory as I said here: #624 (comment) |
I just saw that this is about Comfy not loading this model, not Onetrainer. You should open issue on Comfy repo about this. |
Describe your use-case.
I'm asking for support training models with integrated Long Clip_L(246 effective tokens vs 75):
https://arxiv.org/abs/2403.15378
I've asked and got the answer that integrating Long Clip_L is possible:
https://huggingface.co/zer0int/LongCLIP-GmP-ViT-L-14/discussions/6
What would you like to see as a solution?
As I see things, It can be a checkbox in "Text Encoder 1" with a description like "It's Long Clip_L". When it's checked OT cuts captions after 246 tokens instead of 75
Have you considered alternatives? List them here.
No response
The text was updated successfully, but these errors were encountered: