-
Notifications
You must be signed in to change notification settings - Fork 279
Fix colab pretraining notebook #22
Comments
CHECK THIS OUT https://www.youtube.com/watch?v=Kwhqj93wyXU |
@sokffa Yes. I just tried to fix it, I managed until the last cell which had some incompatibility, outdated/mismatched versions or so which I couldn't fix yet. I tried many things, installing a proper version of basicsr, but there is a strange mismatch, one source file is different in the cloned repository (basicsr/utils/options.py) and in the installation path, with different calls to parse, one is parse(...) the other is parse_options. I couldn't fit it so far, either by installing basicsr with Or with cloning it from the repository and "setup.py install", or even a nasty copying to the python installation location: Maybe importing from the local folder (not the system's installation), that is nasty too, I guess it may work if the paths and imports are adjusted properly. The other fixes:
for:
Import cv2, ignore tqdm: import os, cv2
paths = os.listdir("data/gt")
#for img_path in tqdm(paths):
for img_path in paths:
img = cv2.imread("data/gt/" + img_path)
img = cv2.resize(img, (384, 384))
cv2.imwrite("data/hq/" + img_path, img) |
Hello,
Now I have error "Input spatial size must be 128x128, but received torch.Size([4, 3, 384, 384])" and I know that I could change images resize to 128*128 but then I have another error. Maybe @Markfryazino has an old working environment and can give us the details of libraries versions and/or proper pth files? Note: I'm trying both to train the model for proper lipsync AND use deepfacelab as mentioned above. Thank you! |
Great work! I have identified the same issues. But also stuck at the 128x128 error. |
Good job! What is the other error you get after resizing?
I think 4,3,384,384 means batch size 4, 3 channels etc. Shouldn't you
resize your input to 384x384 rather than to 128, what is the size of yours?
Because 128x128 doesn't sound as much HQ to me, the normal wav2lip is 96x96.
На ср, 17.08.2022 г., 22:36 ч. davidchateau ***@***.***>
написа:
… Hello,
I tried to fix the notebook, here is what I did so far:
-
duplicate the notebook ("file" -> "save a copy in drive")
-
"runtime" -> "change runtime type" -> "GPU" or else I have an error
about no GPU available
-
add "!mkdir data" before the other "mkdir"s
-
downgrade torchvision to avoid deprecation warnings:
!pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113
torchaudio==0.11.0+cu113 -f
https://download.pytorch.org/whl/torch_stable.html
-
install basicsr. I downloaded the code of every previous version of
the library in order to find the ones where "parse" exists in
"basicsr.utils.options" -> it is versions <= 1.3.3.10. I had errors
installing versions < 1.3.3.4 so I went with 1.3.3.4.
!pip3 install
https://files.pythonhosted.org/packages/8c/ac/74f4e34fdbc7d3d9233a6f02a740ddb446d75551fbb6ed0c4243c4511a86/basicsr-1.3.3.4.tar.gz#sha256=b448cf9efa4ff2ca75109d3aac36ef50d6e08b0bcb310ebef57ed88c09a2d2ba
-
create log files directory structure because I had errors about it:
!mkdir /content/wav2lip-hq/experiments/
!mkdir
/content/wav2lip-hq/experiments/001_ESRGAN_x4_f64b23_custom16k_500k_B16G1_wandb/
-
stop pretraining mode as mentioned in #17
<#17>
!sed -i '/resume_state/d' /content/wav2lip-hq/train_basicsr.yml
Now I have error "Input spatial size must be 128x128, but received
torch.Size([4, 3, 384, 384])" and I know that I could change images resize
to 128*128 but then I have another error.
Maybe @Markfryazino <https://github.com/Markfryazino> has an old working
environment and can give us the details of libraries versions and/or proper
pth files?
Note: I'm trying both to train the model for proper lipsync AND use
deepfacelab as mentioned above.
Thank you!
—
Reply to this email directly, view it on GitHub
<#22 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFSI7WFVKI2JGHJTD32LYATVZU5KHANCNFSM52MQARHQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes so my idea was to resize the LQ set too 128x128 and feed them into the model instead of the GT as it made more sense and then also upsamle the HQ images to 512x512 as that is still keep the 4x difference required. My problem is the LQ images are still detected as having a size of 96x96 🙃 |
@AIMads I'll check it out when I could, but that sounds as the neural model has fixed input size (architecture) of 96x96, so I guess your desired solution wouldn't fit so simply, it may require to refactor the NN arch. BTW, one alternative of this library is to use the basic wav2lip and then Deepfacelab with a self-generating model. I used this method for my deepfakes with my custom DFL modification for grayscale training, and that way I repair and upscale the bad and broken mouths from wav2lip to smooth 192x192 faces. My videos "Lena Schwarzenegger announces Arnold',s return in Red Heat 2 ..." https://youtu.be/4F7PB7wBEXk |
Sorry for the dumb question but what am I doing wrong? I did all the steps carefully and at the end, getting this error
|
Hello, I fixed the training notebook. |
The colab pretraining notebook is not updated. There's a lot of bugs on the code (related to paths and inexistent files)
https://colab.research.google.com/drive/1IUGYn-fMRbjH2IyYoAn5VKSzEkaXyP2s
The text was updated successfully, but these errors were encountered: