Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pos loss cannot be reduced #24

Open
FengheTan9 opened this issue Sep 20, 2024 · 5 comments
Open

pos loss cannot be reduced #24

FengheTan9 opened this issue Sep 20, 2024 · 5 comments

Comments

@FengheTan9
Copy link

Hello author, I used your original voco_head file for pre-training, but found that the pos loss could not decrease normally. What could be the problem?

image

@Luffy03
Copy link
Owner

Luffy03 commented Sep 20, 2024

Hi, many thanks for your kind attention to our work!
Would you please provide more details about your pre-training datasets, settings, and environment? And would you please print the labels and logits for debugging?

print('labels and logits:', label[0].data, logits[0].data)

We also provide our training log here https://www.dropbox.com/scl/fi/rmqy9n2gio5tptbhlt239/20240115_232208.txt?rlkey=0jmnpz3n77bb1b9r9wt9aqkrv&dl=0, for comparisons.

@FengheTan9
Copy link
Author

bs=4 ir=2e-4 10k ct datasets:
image
loss function:
image

@Luffy03
Copy link
Owner

Luffy03 commented Sep 20, 2024

Weird, seems the loss did not converge as our provided log. I cannot find the problems now.
The batch_size is 4 and sw_batch_size is still 2, right? And what are your GPU, versions of torch and Cuda? For some environments, we cannot use amp. And would you please also try lr=1e-4?

@FengheTan9
Copy link
Author

Thanks for your reply. The batch_size is 4 and sw_batch_size is 1. Because I used voco_head_old, which does not include student-teacher. In addition, the reason for setting it to 1 is that I saw that there was an operation that modified the random cropping to 1 before.

@Luffy03
Copy link
Owner

Luffy03 commented Sep 20, 2024

Oh I see. You are using the old version. The link here is some researcher modified by himself, which may be due to the limitation of GPU memory. For the old version, I recommend you to try sw_batch_size=4 if you have enough GPU resources.
By the way, our current version with student-teacher is more stable and we will release more powerful models soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants