Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can not reproduce your results. trained from your released pre-trained vit-base model #20

Open
leoozy opened this issue Apr 1, 2022 · 1 comment

Comments

@leoozy
Copy link

leoozy commented Apr 1, 2022

Hello, I tired to reproduce your vit-B results on imagenet-1k. I run the scripts following your readme.md, except that I used one NVIDIA-A100 GPUs *8 instead two nodes.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node 8 main_finetune.py
--cfg /data/users/zhangjunlei/code/simmim/SimMIM-main/configs/vit_base__800ep/simmim_finetune__vit_base__img224__800ep.yaml
--batch-size 128
--data-path /data/users/zhangjunlei/dataset/IMT
--pretrained /data/users/models/simmim_pretrain__vit_base__img224__800ep.pth
--output /data/users/zhangjunlei/output/mim
--tag finetuneIMN_downloadedPretrainedVitbbaseline
--accumulation-steps 2

I run the code without modifying your code. But the max acc is 83.6. Your paper is 83.8. The log is listed following:

[2022-04-01 01:12:41 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.066 (3.066) Loss 0.3681 (0.3681) Acc@1 93.359 (93.359) Acc@5 98.730 (98.730) Mem 39691MB
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.596 Acc@5 96.636
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.62%
[2022-04-01 01:12:53 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.2106559070000715e-06, 1.2106559070000715e-06, 1.3305729095853683e-06, 1.3305729095853683e-06, 1.5150606058704401e-06, 1.5150606058704401e-06, 1.798887830924397e-06, 1.798887830924397e-06, 2.235545100238177e-06, 2.235545100238177e-06, 2.9073255145670688e-06, 2.9073255145670688e-06, 3.940833844303826e-06, 3.940833844303826e-06, 5.53084665928345e-06, 5.53084665928345e-06, 7.977020220790567e-06, 7.977020220790567e-06, 1.1740364161570747e-05, 1.1740364161570747e-05, 1.7530124070463328e-05, 1.7530124070463328e-05, 2.6437447007221146e-05, 2.6437447007221146e-05, 4.0141020756079324e-05, 4.0141020756079324e-05, 6.122344190816884e-05, 6.122344190816884e-05]
[2022-04-01 01:12:57 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][0/1251] eta 1:10:34 lr 0.000061 time 3.3847 (3.3847) loss 1.5133 (1.5133) grad_norm 2.3348 (2.3348) mem 39691MB
[2022-04-01 01:14:07 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][100/1251] eta 0:13:56 lr 0.000060 time 0.6853 (0.7264) loss 1.4034 (1.3413) grad_norm 2.6896 (2.7750) mem 39691MB
[2022-04-01 01:15:17 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][200/1251] eta 0:12:29 lr 0.000059 time 0.6943 (0.7133) loss 1.4613 (1.3375) grad_norm 2.3938 (2.7978) mem 39691MB
[2022-04-01 01:16:27 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][300/1251] eta 0:11:14 lr 0.000057 time 0.6894 (0.7089) loss 0.7287 (1.3413) grad_norm 2.0634 (2.7904) mem 39691MB
[2022-04-01 01:17:37 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][400/1251] eta 0:10:01 lr 0.000056 time 0.6978 (0.7069) loss 1.0116 (1.3347) grad_norm 2.0660 (nan) mem 39691MB
[2022-04-01 01:18:47 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][500/1251] eta 0:08:49 lr 0.000055 time 0.6884 (0.7053) loss 1.4866 (1.3390) grad_norm 2.5877 (nan) mem 39691MB
[2022-04-01 01:19:56 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][600/1251] eta 0:07:38 lr 0.000053 time 0.6879 (0.7044) loss 1.3395 (1.3345) grad_norm 2.0532 (nan) mem 39691MB
[2022-04-01 01:21:07 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][700/1251] eta 0:06:27 lr 0.000052 time 0.6993 (0.7038) loss 1.5304 (1.3324) grad_norm 2.6916 (nan) mem 39691MB
[2022-04-01 01:22:16 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][800/1251] eta 0:05:17 lr 0.000051 time 0.6891 (0.7033) loss 1.0127 (1.3341) grad_norm 1.9580 (nan) mem 39691MB
[2022-04-01 01:23:26 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][900/1251] eta 0:04:06 lr 0.000050 time 0.6901 (0.7028) loss 0.8164 (1.3327) grad_norm 1.9068 (nan) mem 39691MB
[2022-04-01 01:24:36 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1000/1251] eta 0:02:56 lr 0.000048 time 0.6892 (0.7024) loss 1.5583 (1.3314) grad_norm 2.2595 (nan) mem 39691MB
[2022-04-01 01:25:46 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1100/1251] eta 0:01:46 lr 0.000047 time 0.6845 (0.7022) loss 1.3442 (1.3315) grad_norm 2.4191 (nan) mem 39691MB
[2022-04-01 01:26:56 simmim_finetune] (main_finetune.py 222): INFO Train: [93/100][1200/1251] eta 0:00:35 lr 0.000046 time 0.7005 (0.7019) loss 1.2828 (1.3303) grad_norm 2.3000 (nan) mem 39691MB
[2022-04-01 01:27:31 simmim_finetune] (main_finetune.py 230): INFO EPOCH 93 training takes 0:14:38
[2022-04-01 01:27:34 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.048 (3.048) Loss 0.3621 (0.3621) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.550 Acc@5 96.636
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.5%
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.62%
[2022-04-01 01:27:47 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.1549451215637642e-06, 1.1549451215637642e-06, 1.2431484613501714e-06, 1.2431484613501714e-06, 1.378845907175413e-06, 1.378845907175413e-06, 1.5876112084450153e-06, 1.5876112084450153e-06, 1.9087885950136343e-06, 1.9087885950136343e-06, 2.402907651273048e-06, 2.402907651273048e-06, 3.1630908147490695e-06, 3.1630908147490695e-06, 4.332603373942948e-06, 4.332603373942948e-06, 6.131853465010453e-06, 6.131853465010453e-06, 8.899930528191233e-06, 8.899930528191233e-06, 1.3158510625392428e-05, 1.3158510625392428e-05, 1.971017231339427e-05, 1.971017231339427e-05, 2.9789651833397102e-05, 2.9789651833397102e-05, 4.5296543402632226e-05, 4.5296543402632226e-05]
[2022-04-01 01:27:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][0/1251] eta 1:12:29 lr 0.000045 time 3.4766 (3.4766) loss 1.2928 (1.2928) grad_norm 2.0603 (2.0603) mem 39691MB
[2022-04-01 01:29:00 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][100/1251] eta 0:13:58 lr 0.000044 time 0.6995 (0.7282) loss 1.5299 (1.3223) grad_norm 2.9111 (2.7888) mem 39691MB
[2022-04-01 01:30:10 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][200/1251] eta 0:12:31 lr 0.000043 time 0.6885 (0.7147) loss 0.9353 (1.3269) grad_norm 2.2225 (2.7882) mem 39691MB
[2022-04-01 01:31:20 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][300/1251] eta 0:11:14 lr 0.000042 time 0.6891 (0.7095) loss 1.2098 (1.3210) grad_norm 2.1065 (2.7789) mem 39691MB
[2022-04-01 01:32:30 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][400/1251] eta 0:10:01 lr 0.000041 time 0.6902 (0.7068) loss 1.3892 (1.3320) grad_norm 2.4860 (2.7760) mem 39691MB
[2022-04-01 01:33:40 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][500/1251] eta 0:08:50 lr 0.000040 time 0.6902 (0.7058) loss 1.4630 (1.3225) grad_norm 2.4686 (2.7736) mem 39691MB
[2022-04-01 01:34:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][600/1251] eta 0:07:38 lr 0.000039 time 0.6897 (0.7047) loss 1.0282 (1.3204) grad_norm 2.3859 (2.7731) mem 39691MB
[2022-04-01 01:36:00 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][700/1251] eta 0:06:27 lr 0.000037 time 0.6890 (0.7039) loss 1.2109 (1.3232) grad_norm 2.2731 (2.7694) mem 39691MB
[2022-04-01 01:37:10 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][800/1251] eta 0:05:17 lr 0.000036 time 0.6896 (0.7033) loss 1.3349 (1.3265) grad_norm 2.1328 (2.7723) mem 39691MB
[2022-04-01 01:38:20 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][900/1251] eta 0:04:06 lr 0.000035 time 0.7249 (0.7029) loss 1.5155 (1.3238) grad_norm 2.3807 (2.7779) mem 39691MB
[2022-04-01 01:39:30 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1000/1251] eta 0:02:56 lr 0.000034 time 0.6991 (0.7025) loss 1.5035 (1.3234) grad_norm 3.5433 (2.7713) mem 39691MB
[2022-04-01 01:40:40 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1100/1251] eta 0:01:46 lr 0.000033 time 0.6895 (0.7022) loss 1.3362 (1.3223) grad_norm 2.2755 (2.7748) mem 39691MB
[2022-04-01 01:41:50 simmim_finetune] (main_finetune.py 222): INFO Train: [94/100][1200/1251] eta 0:00:35 lr 0.000032 time 0.6905 (0.7020) loss 1.1312 (1.3200) grad_norm 2.2855 (2.7730) mem 39691MB
[2022-04-01 01:42:25 simmim_finetune] (main_finetune.py 230): INFO EPOCH 94 training takes 0:14:38
[2022-04-01 01:42:28 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.078 (3.078) Loss 0.3602 (0.3602) Acc@1 93.457 (93.457) Acc@5 98.730 (98.730) Mem 39691MB
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.656 Acc@5 96.634
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.7%
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 01:42:41 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.107709723978354e-06, 1.107709723978354e-06, 1.1690240608640958e-06, 1.1690240608640958e-06, 1.2633538099190834e-06, 1.2633538099190834e-06, 1.4084765007729102e-06, 1.4084765007729102e-06, 1.6317421790095668e-06, 1.6317421790095668e-06, 1.9752278378351925e-06, 1.9752278378351925e-06, 2.5036673129515393e-06, 2.5036673129515393e-06, 3.316651120822842e-06, 3.316651120822842e-06, 4.567395440624847e-06, 4.567395440624847e-06, 6.491617471089471e-06, 6.491617471089471e-06, 9.45195905641966e-06, 9.45195905641966e-06, 1.4006330726158412e-05, 1.4006330726158412e-05, 2.1013056371910337e-05, 2.1013056371910337e-05, 3.1792634288451755e-05, 3.1792634288451755e-05]
[2022-04-01 01:42:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][0/1251] eta 1:14:20 lr 0.000032 time 3.5654 (3.5654) loss 1.4688 (1.4688) grad_norm 2.3501 (2.3501) mem 39691MB
[2022-04-01 01:43:54 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][100/1251] eta 0:13:59 lr 0.000031 time 0.6891 (0.7293) loss 1.0123 (1.3321) grad_norm 2.7074 (2.8018) mem 39691MB
[2022-04-01 01:45:04 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][200/1251] eta 0:12:30 lr 0.000030 time 0.6897 (0.7144) loss 1.0944 (1.3172) grad_norm 2.3161 (nan) mem 39691MB
[2022-04-01 01:46:14 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][300/1251] eta 0:11:14 lr 0.000029 time 0.6853 (0.7096) loss 1.5585 (1.3295) grad_norm 2.4806 (nan) mem 39691MB
[2022-04-01 01:47:24 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][400/1251] eta 0:10:01 lr 0.000028 time 0.6891 (0.7070) loss 1.5054 (1.3297) grad_norm 2.2357 (nan) mem 39691MB
[2022-04-01 01:48:34 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][500/1251] eta 0:08:49 lr 0.000027 time 0.6900 (0.7056) loss 0.9077 (1.3258) grad_norm 2.1560 (nan) mem 39691MB
[2022-04-01 01:49:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][600/1251] eta 0:07:38 lr 0.000026 time 0.6884 (0.7048) loss 0.8389 (1.3274) grad_norm 2.0718 (nan) mem 39691MB
[2022-04-01 01:50:54 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][700/1251] eta 0:06:27 lr 0.000025 time 0.6897 (0.7041) loss 1.1630 (1.3244) grad_norm 2.7615 (nan) mem 39691MB
[2022-04-01 01:52:04 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][800/1251] eta 0:05:17 lr 0.000024 time 0.6899 (0.7035) loss 1.3777 (1.3235) grad_norm 2.2742 (nan) mem 39691MB
[2022-04-01 01:53:14 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][900/1251] eta 0:04:06 lr 0.000024 time 0.6901 (0.7030) loss 1.3908 (1.3225) grad_norm 2.2728 (nan) mem 39691MB
[2022-04-01 01:54:24 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1000/1251] eta 0:02:56 lr 0.000023 time 0.6911 (0.7026) loss 1.1246 (1.3207) grad_norm 2.3794 (nan) mem 39691MB
[2022-04-01 01:55:34 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1100/1251] eta 0:01:46 lr 0.000022 time 0.6891 (0.7024) loss 1.4379 (1.3210) grad_norm 2.5842 (nan) mem 39691MB
[2022-04-01 01:56:44 simmim_finetune] (main_finetune.py 222): INFO Train: [95/100][1200/1251] eta 0:00:35 lr 0.000021 time 0.6986 (0.7021) loss 1.4860 (1.3207) grad_norm 2.2715 (nan) mem 39691MB
[2022-04-01 01:57:19 simmim_finetune] (main_finetune.py 230): INFO EPOCH 95 training takes 0:14:38
[2022-04-01 01:57:19 simmim_finetune] (utils.py 60): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_95.pth saving......
[2022-04-01 01:57:20 simmim_finetune] (utils.py 62): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_95.pth saved !!!
[2022-04-01 01:57:23 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 2.916 (2.916) Loss 0.3640 (0.3640) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.602 Acc@5 96.632
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 01:57:35 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.068996329878458e-06, 1.068996329878458e-06, 1.1082728599612733e-06, 1.1082728599612733e-06, 1.168698290857912e-06, 1.168698290857912e-06, 1.2616604922373566e-06, 1.2616604922373566e-06, 1.4046792635903478e-06, 1.4046792635903478e-06, 1.62470814259495e-06, 1.62470814259495e-06, 1.963214110294338e-06, 1.963214110294338e-06, 2.4839925221395496e-06, 2.4839925221395496e-06, 3.285190078824491e-06, 3.285190078824491e-06, 4.517801704493633e-06, 4.517801704493633e-06, 6.414127282446157e-06, 6.414127282446157e-06, 9.331551248526965e-06, 9.331551248526965e-06, 1.3819895811728205e-05, 1.3819895811728205e-05, 2.0725041293576267e-05, 2.0725041293576267e-05]
[2022-04-01 01:57:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][0/1251] eta 1:12:07 lr 0.000021 time 3.4596 (3.4596) loss 1.4750 (1.4750) grad_norm 2.6895 (2.6895) mem 39691MB
[2022-04-01 01:58:49 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][100/1251] eta 0:13:55 lr 0.000020 time 0.6900 (0.7259) loss 1.5745 (1.2920) grad_norm 2.6246 (2.7933) mem 39691MB
[2022-04-01 01:59:58 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][200/1251] eta 0:12:28 lr 0.000019 time 0.6896 (0.7125) loss 0.8455 (1.3166) grad_norm 2.4287 (2.7917) mem 39691MB
[2022-04-01 02:01:09 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][300/1251] eta 0:11:13 lr 0.000018 time 0.6900 (0.7086) loss 0.9218 (1.3165) grad_norm 2.2249 (2.7938) mem 39691MB
[2022-04-01 02:02:19 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][400/1251] eta 0:10:01 lr 0.000018 time 0.6907 (0.7068) loss 0.9292 (1.3220) grad_norm 2.1898 (2.7949) mem 39691MB
[2022-04-01 02:03:29 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][500/1251] eta 0:08:49 lr 0.000017 time 0.6890 (0.7053) loss 1.1393 (1.3168) grad_norm 2.5243 (2.7906) mem 39691MB
[2022-04-01 02:04:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][600/1251] eta 0:07:38 lr 0.000016 time 0.6888 (0.7045) loss 1.3772 (1.3255) grad_norm 2.5142 (2.7838) mem 39691MB
[2022-04-01 02:05:49 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][700/1251] eta 0:06:27 lr 0.000016 time 0.6895 (0.7038) loss 1.5106 (1.3251) grad_norm 2.3905 (2.7886) mem 39691MB
[2022-04-01 02:06:59 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][800/1251] eta 0:05:17 lr 0.000015 time 0.6896 (0.7033) loss 1.4453 (1.3282) grad_norm 2.3186 (2.7831) mem 39691MB
[2022-04-01 02:08:09 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][900/1251] eta 0:04:06 lr 0.000014 time 0.6890 (0.7029) loss 1.6301 (1.3289) grad_norm 2.6045 (2.7818) mem 39691MB
[2022-04-01 02:09:19 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1000/1251] eta 0:02:56 lr 0.000014 time 0.6890 (0.7027) loss 1.3683 (1.3259) grad_norm 2.2704 (2.7809) mem 39691MB
[2022-04-01 02:10:29 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1100/1251] eta 0:01:46 lr 0.000013 time 0.6898 (0.7024) loss 1.1971 (1.3261) grad_norm 2.1725 (inf) mem 39691MB
[2022-04-01 02:11:39 simmim_finetune] (main_finetune.py 222): INFO Train: [96/100][1200/1251] eta 0:00:35 lr 0.000012 time 0.6898 (0.7022) loss 1.4186 (1.3273) grad_norm 2.1364 (inf) mem 39691MB
[2022-04-01 02:12:14 simmim_finetune] (main_finetune.py 230): INFO EPOCH 96 training takes 0:14:38
[2022-04-01 02:12:17 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.006 (3.006) Loss 0.3643 (0.3643) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.592 Acc@5 96.658
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:12:29 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0388431447101286e-06, 1.0388431447101286e-06, 1.060954812742414e-06, 1.060954812742414e-06, 1.0949727635613148e-06, 1.0949727635613148e-06, 1.1473080725134695e-06, 1.1473080725134695e-06, 1.2278239324398616e-06, 1.2278239324398616e-06, 1.3516944861727725e-06, 1.3516944861727725e-06, 1.542264568838789e-06, 1.542264568838789e-06, 1.8354493114018914e-06, 1.8354493114018914e-06, 2.286502761498972e-06, 2.286502761498972e-06, 2.9804311462637114e-06, 2.9804311462637114e-06, 4.048013276671003e-06, 4.048013276671003e-06, 5.69044732345145e-06, 5.69044732345145e-06, 8.21726893388291e-06, 8.21726893388291e-06, 1.2104686796085155e-05, 1.2104686796085155e-05]
[2022-04-01 02:12:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][0/1251] eta 1:10:45 lr 0.000012 time 3.3938 (3.3938) loss 1.1607 (1.1607) grad_norm 2.6309 (2.6309) mem 39691MB
[2022-04-01 02:13:43 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][100/1251] eta 0:13:56 lr 0.000012 time 0.6900 (0.7268) loss 1.4923 (1.3099) grad_norm 2.3810 (2.7979) mem 39691MB
[2022-04-01 02:14:53 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][200/1251] eta 0:12:30 lr 0.000011 time 0.6901 (0.7136) loss 1.3939 (1.3374) grad_norm 2.4213 (2.8035) mem 39691MB
[2022-04-01 02:16:03 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][300/1251] eta 0:11:14 lr 0.000010 time 0.6900 (0.7089) loss 1.4464 (1.3354) grad_norm 2.3146 (2.7829) mem 39691MB
[2022-04-01 02:17:13 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][400/1251] eta 0:10:01 lr 0.000010 time 0.7050 (0.7067) loss 1.4488 (1.3360) grad_norm 2.3009 (2.7833) mem 39691MB
[2022-04-01 02:18:23 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][500/1251] eta 0:08:49 lr 0.000009 time 0.6896 (0.7056) loss 1.5225 (1.3309) grad_norm 2.6082 (2.7907) mem 39691MB
[2022-04-01 02:19:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][600/1251] eta 0:07:38 lr 0.000009 time 0.6896 (0.7047) loss 1.5717 (1.3272) grad_norm 2.2648 (2.7870) mem 39691MB
[2022-04-01 02:20:43 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][700/1251] eta 0:06:28 lr 0.000008 time 0.6906 (0.7042) loss 1.2725 (1.3265) grad_norm 2.9951 (2.7761) mem 39691MB
[2022-04-01 02:21:53 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][800/1251] eta 0:05:17 lr 0.000008 time 0.6897 (0.7038) loss 1.5141 (1.3220) grad_norm 2.0174 (2.7768) mem 39691MB
[2022-04-01 02:23:03 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][900/1251] eta 0:04:06 lr 0.000007 time 0.6903 (0.7034) loss 1.5645 (1.3245) grad_norm 2.3992 (2.7796) mem 39691MB
[2022-04-01 02:24:13 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1000/1251] eta 0:02:56 lr 0.000007 time 0.6900 (0.7030) loss 0.8204 (1.3204) grad_norm 2.7277 (2.7826) mem 39691MB
[2022-04-01 02:25:23 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1100/1251] eta 0:01:46 lr 0.000007 time 0.6897 (0.7027) loss 1.3750 (1.3185) grad_norm 1.9739 (2.7843) mem 39691MB
[2022-04-01 02:26:33 simmim_finetune] (main_finetune.py 222): INFO Train: [97/100][1200/1251] eta 0:00:35 lr 0.000006 time 0.6888 (0.7025) loss 1.2978 (1.3201) grad_norm 2.1809 (2.7888) mem 39691MB
[2022-04-01 02:27:08 simmim_finetune] (main_finetune.py 230): INFO EPOCH 97 training takes 0:14:38
[2022-04-01 02:27:11 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.004 (3.004) Loss 0.3647 (0.3647) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.596 Acc@5 96.674
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:27:24 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0172799260266887e-06, 1.0172799260266887e-06, 1.0271166164073458e-06, 1.0271166164073458e-06, 1.0422499862237412e-06, 1.0422499862237412e-06, 1.0655320936335803e-06, 1.0655320936335803e-06, 1.101350720417948e-06, 1.101350720417948e-06, 1.156456300086206e-06, 1.156456300086206e-06, 1.2412341149604493e-06, 1.2412341149604493e-06, 1.371661522459285e-06, 1.371661522459285e-06, 1.5723190724574938e-06, 1.5723190724574938e-06, 1.8810229955316616e-06, 1.8810229955316616e-06, 2.3559521079534575e-06, 2.3559521079534575e-06, 3.0866122809100673e-06, 3.0866122809100673e-06, 4.210704854689466e-06, 4.210704854689466e-06, 5.9400780451193105e-06, 5.9400780451193105e-06]
[2022-04-01 02:27:27 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][0/1251] eta 1:07:01 lr 0.000006 time 3.2148 (3.2148) loss 1.6244 (1.6244) grad_norm 2.8371 (2.8371) mem 39691MB
[2022-04-01 02:28:37 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][100/1251] eta 0:13:52 lr 0.000006 time 0.6894 (0.7234) loss 1.4783 (1.3294) grad_norm 2.2639 (2.7801) mem 39691MB
[2022-04-01 02:29:47 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][200/1251] eta 0:12:27 lr 0.000005 time 0.6890 (0.7116) loss 1.5537 (1.3179) grad_norm 2.2326 (2.7782) mem 39691MB
[2022-04-01 02:30:57 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][300/1251] eta 0:11:12 lr 0.000005 time 0.6901 (0.7073) loss 1.4142 (1.3260) grad_norm 3.1747 (2.7874) mem 39691MB
[2022-04-01 02:32:07 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][400/1251] eta 0:10:00 lr 0.000004 time 0.6900 (0.7054) loss 1.1928 (1.3229) grad_norm 2.6400 (2.7897) mem 39691MB
[2022-04-01 02:33:17 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][500/1251] eta 0:08:48 lr 0.000004 time 0.6899 (0.7043) loss 1.3475 (1.3193) grad_norm 2.2912 (2.7904) mem 39691MB
[2022-04-01 02:34:27 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][600/1251] eta 0:07:38 lr 0.000004 time 0.6880 (0.7036) loss 1.4417 (1.3187) grad_norm 2.1597 (2.7875) mem 39691MB
[2022-04-01 02:35:37 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][700/1251] eta 0:06:27 lr 0.000004 time 0.6896 (0.7030) loss 0.9872 (1.3167) grad_norm 2.2810 (2.7852) mem 39691MB
[2022-04-01 02:36:46 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][800/1251] eta 0:05:16 lr 0.000003 time 0.6900 (0.7026) loss 1.2168 (1.3158) grad_norm 2.3407 (2.7873) mem 39691MB
[2022-04-01 02:37:56 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][900/1251] eta 0:04:06 lr 0.000003 time 0.6897 (0.7023) loss 1.1657 (1.3130) grad_norm 2.2128 (2.7891) mem 39691MB
[2022-04-01 02:39:06 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1000/1251] eta 0:02:56 lr 0.000003 time 0.6889 (0.7020) loss 1.4734 (1.3160) grad_norm 2.2695 (2.7867) mem 39691MB
[2022-04-01 02:40:16 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1100/1251] eta 0:01:45 lr 0.000003 time 0.6888 (0.7018) loss 1.4531 (1.3167) grad_norm 2.3891 (2.7864) mem 39691MB
[2022-04-01 02:41:26 simmim_finetune] (main_finetune.py 222): INFO Train: [98/100][1200/1251] eta 0:00:35 lr 0.000002 time 0.6893 (0.7015) loss 0.9294 (1.3169) grad_norm 2.5364 (inf) mem 39691MB
[2022-04-01 02:42:01 simmim_finetune] (main_finetune.py 230): INFO EPOCH 98 training takes 0:14:37
[2022-04-01 02:42:04 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 3.071 (3.071) Loss 0.3626 (0.3626) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.570 Acc@5 96.640
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:42:17 simmim_finetune] (main_finetune.py 155): INFO Current learning rate for different parameter groups: [1.0043279541216191e-06, 1.0043279541216191e-06, 1.0067916651705151e-06, 1.0067916651705151e-06, 1.010581989861124e-06, 1.010581989861124e-06, 1.0164132586159072e-06, 1.0164132586159072e-06, 1.0253844413155735e-06, 1.0253844413155735e-06, 1.0391862608535217e-06, 1.0391862608535217e-06, 1.060419829373442e-06, 1.060419829373442e-06, 1.0930868578656273e-06, 1.0930868578656273e-06, 1.1433438247766814e-06, 1.1433438247766814e-06, 1.2206622354090724e-06, 1.2206622354090724e-06, 1.3396136363819813e-06, 1.3396136363819813e-06, 1.5226157917249184e-06, 1.5226157917249184e-06, 1.8041575691755905e-06, 1.8041575691755905e-06, 2.237298765253548e-06, 2.237298765253548e-06]
[2022-04-01 02:42:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][0/1251] eta 1:09:49 lr 0.000002 time 3.3489 (3.3489) loss 1.0033 (1.0033) grad_norm 2.1860 (2.1860) mem 39691MB
[2022-04-01 02:43:30 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][100/1251] eta 0:13:54 lr 0.000002 time 0.6967 (0.7249) loss 0.8255 (1.2927) grad_norm 2.2087 (2.7138) mem 39691MB
[2022-04-01 02:44:40 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][200/1251] eta 0:12:28 lr 0.000002 time 0.6892 (0.7121) loss 1.0470 (1.2953) grad_norm 1.9035 (2.7317) mem 39691MB
[2022-04-01 02:45:50 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][300/1251] eta 0:11:13 lr 0.000002 time 0.6904 (0.7080) loss 0.9423 (1.2993) grad_norm 2.3368 (2.7545) mem 39691MB
[2022-04-01 02:47:00 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][400/1251] eta 0:10:00 lr 0.000002 time 0.6895 (0.7058) loss 1.1605 (1.3136) grad_norm 2.5136 (2.7636) mem 39691MB
[2022-04-01 02:48:10 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][500/1251] eta 0:08:49 lr 0.000001 time 0.6903 (0.7046) loss 0.9751 (1.3128) grad_norm 2.5944 (2.7818) mem 39691MB
[2022-04-01 02:49:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][600/1251] eta 0:07:38 lr 0.000001 time 0.7461 (0.7037) loss 1.6204 (1.3107) grad_norm 2.2264 (2.7801) mem 39691MB
[2022-04-01 02:50:30 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][700/1251] eta 0:06:27 lr 0.000001 time 0.6892 (0.7030) loss 1.2943 (1.3101) grad_norm 2.3324 (2.7893) mem 39691MB
[2022-04-01 02:51:40 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][800/1251] eta 0:05:16 lr 0.000001 time 0.6891 (0.7026) loss 1.3340 (1.3111) grad_norm 2.0425 (2.7911) mem 39691MB
[2022-04-01 02:52:50 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][900/1251] eta 0:04:06 lr 0.000001 time 0.7028 (0.7021) loss 1.1269 (1.3138) grad_norm 2.3656 (2.7906) mem 39691MB
[2022-04-01 02:54:00 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1000/1251] eta 0:02:56 lr 0.000001 time 0.6908 (0.7020) loss 1.0681 (1.3161) grad_norm 2.0883 (2.7906) mem 39691MB
[2022-04-01 02:55:10 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1100/1251] eta 0:01:45 lr 0.000001 time 0.6897 (0.7017) loss 0.9208 (1.3166) grad_norm 2.1637 (2.7937) mem 39691MB
[2022-04-01 02:56:20 simmim_finetune] (main_finetune.py 222): INFO Train: [99/100][1200/1251] eta 0:00:35 lr 0.000001 time 0.6889 (0.7016) loss 1.5211 (1.3184) grad_norm 2.0254 (2.7999) mem 39691MB
[2022-04-01 02:56:55 simmim_finetune] (main_finetune.py 230): INFO EPOCH 99 training takes 0:14:37
[2022-04-01 02:56:55 simmim_finetune] (utils.py 60): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_99.pth saving......
[2022-04-01 02:56:56 simmim_finetune] (utils.py 62): INFO /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/ckpt_epoch_99.pth saved !!!
[2022-04-01 02:56:59 simmim_finetune] (main_finetune.py 269): INFO Test: [0/49] Time 2.885 (2.885) Loss 0.3653 (0.3653) Acc@1 93.555 (93.555) Acc@5 98.633 (98.633) Mem 39691MB
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 276): INFO * Acc@1 83.606 Acc@5 96.688
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 142): INFO Accuracy of the network on the 50000 test images: 83.6%
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 144): INFO Max accuracy: 83.66%
[2022-04-01 02:57:11 simmim_finetune] (main_finetune.py 148): INFO Training time 1 day, 0:49:35
[2022-04-01 03:17:39 simmim_finetune] (main_finetune.py 344): INFO Full config saved to /data/users/zhangjunlei/output/mim/simmim_finetune/finetune_downloadedPretrainedVitbbaseline/config.json
[2022-04-01 03:17:39 simmim_finetune] (main_finetune.py 347): INFO AMP_OPT_LEVEL: O1

@kobiso
Copy link

kobiso commented Apr 14, 2022

I had the same problem when I finetuned the ViT-B/32 with 1 node or 4 nodes.
But when I finetuned with 2 nodes, I got 83.784 top-1 acc.
Try finetuning with 2 nodes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants