You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.
Could you please report numerical results of the experiments? I conduct the standard finetuning on 8*3090s with: python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type standard --label_type gt --batch_size 64 --grad_steps 2
I only got an accuarcy of 60.2% on CQA with the last epoch. But It seems to be around 63% reported in the paper.
Here is my training log:
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Bootstrap : Using eth0:10.243.152.6<0>
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO NET/Socket : Using [0]eth0:10.243.152.6<0>
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Using network Socket
NCCL version 2.10.3+cuda11.3
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO NCCL_MAX_NCHANNELS set by environment to 2.
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO NCCL_MIN_NCHANNELS set by environment to 2.
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 00/02 : 0 1 2 3 4 5 6 7
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 01/02 : 0 1 2 3 4 5 6 7
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 00 : 4[b0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 00 : 3[a0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 00 : 6[d0] -> 7[e0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 00 : 5[c0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 01 : 4[b0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 00 : 2[90] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 01 : 3[a0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 00 : 0[70] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 01 : 6[d0] -> 7[e0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 00 : 7[e0] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 01 : 5[c0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 01 : 2[90] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 00 : 1[80] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Channel 01 : 0[70] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 01 : 7[e0] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 01 : 1[80] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Connected all rings
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 00 : 7[e0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Channel 01 : 7[e0] -> 6[d0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 00 : 4[b0] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 00 : 5[c0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 00 : 3[a0] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Channel 01 : 4[b0] -> 3[a0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Channel 01 : 5[c0] -> 4[b0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 00 : 6[d0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Channel 01 : 3[a0] -> 2[90] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 00 : 1[80] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 00 : 2[90] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Channel 01 : 6[d0] -> 5[c0] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Channel 01 : 1[80] -> 0[70] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Channel 01 : 2[90] -> 1[80] via direct shared memory
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO Connected all trees
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer
dsw-27183-759b57b4d6-kz2vc:261724:365 [3] NCCL INFO comm 0x7f0000002fb0 rank 3 nranks 8 cudaDev 3 busId a0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:368 [5] NCCL INFO comm 0x7efff8002fb0 rank 5 nranks 8 cudaDev 5 busId c0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:363 [1] NCCL INFO comm 0x7f0008002fb0 rank 1 nranks 8 cudaDev 1 busId 80 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:366 [4] NCCL INFO comm 0x7efff4002fb0 rank 4 nranks 8 cudaDev 4 busId b0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:369 [6] NCCL INFO comm 0x7effec002fb0 rank 6 nranks 8 cudaDev 6 busId d0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:370 [7] NCCL INFO comm 0x7efff0002fb0 rank 7 nranks 8 cudaDev 7 busId e0 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:362 [0] NCCL INFO comm 0x7f0004002fb0 rank 0 nranks 8 cudaDev 0 busId 70 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:364 [2] NCCL INFO comm 0x7efffc002fb0 rank 2 nranks 8 cudaDev 2 busId 90 - Init COMPLETE
dsw-27183-759b57b4d6-kz2vc:261724:353 [0] NCCL INFO Launch mode Parallel
{'loss': 7.6906, 'learning_rate': 4.875e-05, 'epoch': 27.78}
{'eval_test_loss': 0.8547041416168213, 'eval_test_accuracy': 0.2457002457002457, 'eval_test_runtime': 5.5955, 'eval_test_samples_per_second': 218.211, 'eval_test_steps_per_second': 0.536, 'epoch': 27.78}
{'loss': 1.098, 'learning_rate': 4.75e-05, 'epoch': 55.56}
{'eval_test_loss': 0.5050153136253357, 'eval_test_accuracy': 0.5552825552825553, 'eval_test_runtime': 5.5448, 'eval_test_samples_per_second': 220.205, 'eval_test_steps_per_second': 0.541, 'epoch': 55.56}
{'loss': 0.4848, 'learning_rate': 4.6250000000000006e-05, 'epoch': 83.33}
{'eval_test_loss': 0.5543485879898071, 'eval_test_accuracy': 0.5954135954135954, 'eval_test_runtime': 5.569, 'eval_test_samples_per_second': 219.248, 'eval_test_steps_per_second': 0.539, 'epoch': 83.33}
{'loss': 0.2971, 'learning_rate': 4.5e-05, 'epoch': 111.11}
{'eval_test_loss': 0.6299827098846436, 'eval_test_accuracy': 0.6036036036036037, 'eval_test_runtime': 5.5232, 'eval_test_samples_per_second': 221.068, 'eval_test_steps_per_second': 0.543, 'epoch': 111.11}
{'loss': 0.196, 'learning_rate': 4.375e-05, 'epoch': 138.89}
{'eval_test_loss': 0.7029837369918823, 'eval_test_accuracy': 0.6109746109746109, 'eval_test_runtime': 5.5637, 'eval_test_samples_per_second': 219.458, 'eval_test_steps_per_second': 0.539, 'epoch': 138.89}
{'loss': 0.1373, 'learning_rate': 4.25e-05, 'epoch': 166.67}
{'eval_test_loss': 0.7832159399986267, 'eval_test_accuracy': 0.6126126126126126, 'eval_test_runtime': 5.5722, 'eval_test_samples_per_second': 219.125, 'eval_test_steps_per_second': 0.538, 'epoch': 166.67}
{'loss': 0.1015, 'learning_rate': 4.125e-05, 'epoch': 194.44}
{'eval_test_loss': 0.8421533703804016, 'eval_test_accuracy': 0.6109746109746109, 'eval_test_runtime': 5.5786, 'eval_test_samples_per_second': 218.873, 'eval_test_steps_per_second': 0.538, 'epoch': 194.44}
{'loss': 0.0771, 'learning_rate': 4e-05, 'epoch': 222.22}
{'eval_test_loss': 0.9177669882774353, 'eval_test_accuracy': 0.6183456183456183, 'eval_test_runtime': 5.5181, 'eval_test_samples_per_second': 221.273, 'eval_test_steps_per_second': 0.544, 'epoch': 222.22}
{'loss': 0.0607, 'learning_rate': 3.875e-05, 'epoch': 250.0}
{'eval_test_loss': 0.9690037369728088, 'eval_test_accuracy': 0.6134316134316135, 'eval_test_runtime': 5.5698, 'eval_test_samples_per_second': 219.217, 'eval_test_steps_per_second': 0.539, 'epoch': 250.0}
{'loss': 0.0497, 'learning_rate': 3.7500000000000003e-05, 'epoch': 277.78}
{'eval_test_loss': 1.0180637836456299, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5507, 'eval_test_samples_per_second': 219.973, 'eval_test_steps_per_second': 0.54, 'epoch': 277.78}
{'loss': 0.0408, 'learning_rate': 3.625e-05, 'epoch': 305.56}
{'eval_test_loss': 1.040199875831604, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.573, 'eval_test_samples_per_second': 219.091, 'eval_test_steps_per_second': 0.538, 'epoch': 305.56}
{'loss': 0.0348, 'learning_rate': 3.5e-05, 'epoch': 333.33}
{'eval_test_loss': 1.1167311668395996, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5648, 'eval_test_samples_per_second': 219.416, 'eval_test_steps_per_second': 0.539, 'epoch': 333.33}
{'loss': 0.0292, 'learning_rate': 3.375000000000001e-05, 'epoch': 361.11}
{'eval_test_loss': 1.1364021301269531, 'eval_test_accuracy': 0.6027846027846028, 'eval_test_runtime': 5.5637, 'eval_test_samples_per_second': 219.459, 'eval_test_steps_per_second': 0.539, 'epoch': 361.11}
{'loss': 0.0258, 'learning_rate': 3.2500000000000004e-05, 'epoch': 388.89}
{'eval_test_loss': 1.1679093837738037, 'eval_test_accuracy': 0.6117936117936118, 'eval_test_runtime': 5.5548, 'eval_test_samples_per_second': 219.81, 'eval_test_steps_per_second': 0.54, 'epoch': 388.89}
{'loss': 0.023, 'learning_rate': 3.125e-05, 'epoch': 416.67}
{'eval_test_loss': 1.205809473991394, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.575, 'eval_test_samples_per_second': 219.014, 'eval_test_steps_per_second': 0.538, 'epoch': 416.67}
{'loss': 0.0201, 'learning_rate': 3e-05, 'epoch': 444.44}
{'eval_test_loss': 1.2262288331985474, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5226, 'eval_test_samples_per_second': 221.091, 'eval_test_steps_per_second': 0.543, 'epoch': 444.44}
{'loss': 0.0182, 'learning_rate': 2.8749999999999997e-05, 'epoch': 472.22}
{'eval_test_loss': 1.2057785987854004, 'eval_test_accuracy': 0.6101556101556102, 'eval_test_runtime': 5.5463, 'eval_test_samples_per_second': 220.146, 'eval_test_steps_per_second': 0.541, 'epoch': 472.22}
{'loss': 0.0157, 'learning_rate': 2.7500000000000004e-05, 'epoch': 500.0}
{'eval_test_loss': 1.2767386436462402, 'eval_test_accuracy': 0.6093366093366094, 'eval_test_runtime': 5.5533, 'eval_test_samples_per_second': 219.867, 'eval_test_steps_per_second': 0.54, 'epoch': 500.0}
{'loss': 0.0149, 'learning_rate': 2.625e-05, 'epoch': 527.78}
{'eval_test_loss': 1.3246893882751465, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.6071, 'eval_test_samples_per_second': 217.759, 'eval_test_steps_per_second': 0.535, 'epoch': 527.78}
{'loss': 0.0133, 'learning_rate': 2.5e-05, 'epoch': 555.56}
{'eval_test_loss': 1.3044090270996094, 'eval_test_accuracy': 0.6117936117936118, 'eval_test_runtime': 5.5897, 'eval_test_samples_per_second': 218.437, 'eval_test_steps_per_second': 0.537, 'epoch': 555.56}
{'loss': 0.0124, 'learning_rate': 2.375e-05, 'epoch': 583.33}
{'eval_test_loss': 1.3567758798599243, 'eval_test_accuracy': 0.6085176085176085, 'eval_test_runtime': 5.5682, 'eval_test_samples_per_second': 219.28, 'eval_test_steps_per_second': 0.539, 'epoch': 583.33}
{'loss': 0.0116, 'learning_rate': 2.25e-05, 'epoch': 611.11}
{'eval_test_loss': 1.3604899644851685, 'eval_test_accuracy': 0.6060606060606061, 'eval_test_runtime': 5.5799, 'eval_test_samples_per_second': 218.819, 'eval_test_steps_per_second': 0.538, 'epoch': 611.11}
{'loss': 0.011, 'learning_rate': 2.125e-05, 'epoch': 638.89}
{'eval_test_loss': 1.3682199716567993, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.5523, 'eval_test_samples_per_second': 219.907, 'eval_test_steps_per_second': 0.54, 'epoch': 638.89}
{'loss': 0.0099, 'learning_rate': 2e-05, 'epoch': 666.67}
{'eval_test_loss': 1.4006143808364868, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5158, 'eval_test_samples_per_second': 221.363, 'eval_test_steps_per_second': 0.544, 'epoch': 666.67}
{'loss': 0.0091, 'learning_rate': 1.8750000000000002e-05, 'epoch': 694.44}
{'eval_test_loss': 1.4297248125076294, 'eval_test_accuracy': 0.6027846027846028, 'eval_test_runtime': 5.5505, 'eval_test_samples_per_second': 219.981, 'eval_test_steps_per_second': 0.54, 'epoch': 694.44}
{'loss': 0.0091, 'learning_rate': 1.75e-05, 'epoch': 722.22}
{'eval_test_loss': 1.4137226343154907, 'eval_test_accuracy': 0.5945945945945946, 'eval_test_runtime': 5.5458, 'eval_test_samples_per_second': 220.168, 'eval_test_steps_per_second': 0.541, 'epoch': 722.22}
{'loss': 0.0083, 'learning_rate': 1.6250000000000002e-05, 'epoch': 750.0}
{'eval_test_loss': 1.4431531429290771, 'eval_test_accuracy': 0.597051597051597, 'eval_test_runtime': 5.5794, 'eval_test_samples_per_second': 218.841, 'eval_test_steps_per_second': 0.538, 'epoch': 750.0}
{'loss': 0.0083, 'learning_rate': 1.5e-05, 'epoch': 777.78}
{'eval_test_loss': 1.4453905820846558, 'eval_test_accuracy': 0.5995085995085995, 'eval_test_runtime': 5.5507, 'eval_test_samples_per_second': 219.97, 'eval_test_steps_per_second': 0.54, 'epoch': 777.78}
{'loss': 0.0076, 'learning_rate': 1.3750000000000002e-05, 'epoch': 805.56}
{'eval_test_loss': 1.448009967803955, 'eval_test_accuracy': 0.6052416052416052, 'eval_test_runtime': 5.5691, 'eval_test_samples_per_second': 219.245, 'eval_test_steps_per_second': 0.539, 'epoch': 805.56}
{'loss': 0.0072, 'learning_rate': 1.25e-05, 'epoch': 833.33}
{'eval_test_loss': 1.4657503366470337, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5663, 'eval_test_samples_per_second': 219.357, 'eval_test_steps_per_second': 0.539, 'epoch': 833.33}
{'loss': 0.0073, 'learning_rate': 1.125e-05, 'epoch': 861.11}
{'eval_test_loss': 1.4750993251800537, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5357, 'eval_test_samples_per_second': 220.569, 'eval_test_steps_per_second': 0.542, 'epoch': 861.11}
{'loss': 0.0067, 'learning_rate': 1e-05, 'epoch': 888.89}
{'eval_test_loss': 1.4982140064239502, 'eval_test_accuracy': 0.6076986076986077, 'eval_test_runtime': 5.5706, 'eval_test_samples_per_second': 219.188, 'eval_test_steps_per_second': 0.539, 'epoch': 888.89}
{'loss': 0.007, 'learning_rate': 8.75e-06, 'epoch': 916.67}
{'eval_test_loss': 1.4590543508529663, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.546, 'eval_test_samples_per_second': 220.16, 'eval_test_steps_per_second': 0.541, 'epoch': 916.67}
{'loss': 0.0062, 'learning_rate': 7.5e-06, 'epoch': 944.44}
{'eval_test_loss': 1.4887970685958862, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.562, 'eval_test_samples_per_second': 219.525, 'eval_test_steps_per_second': 0.539, 'epoch': 944.44}
{'loss': 0.0061, 'learning_rate': 6.25e-06, 'epoch': 972.22}
{'eval_test_loss': 1.506649136543274, 'eval_test_accuracy': 0.6011466011466011, 'eval_test_runtime': 5.5727, 'eval_test_samples_per_second': 219.105, 'eval_test_steps_per_second': 0.538, 'epoch': 972.22}
{'loss': 0.0061, 'learning_rate': 5e-06, 'epoch': 1000.0}
{'eval_test_loss': 1.504927158355713, 'eval_test_accuracy': 0.6068796068796068, 'eval_test_runtime': 5.5313, 'eval_test_samples_per_second': 220.743, 'eval_test_steps_per_second': 0.542, 'epoch': 1000.0}
{'loss': 0.0059, 'learning_rate': 3.75e-06, 'epoch': 1027.78}
{'eval_test_loss': 1.4991811513900757, 'eval_test_accuracy': 0.6044226044226044, 'eval_test_runtime': 5.5587, 'eval_test_samples_per_second': 219.655, 'eval_test_steps_per_second': 0.54, 'epoch': 1027.78}
{'loss': 0.0058, 'learning_rate': 2.5e-06, 'epoch': 1055.56}
{'eval_test_loss': 1.5103862285614014, 'eval_test_accuracy': 0.5995085995085995, 'eval_test_runtime': 5.5427, 'eval_test_samples_per_second': 220.29, 'eval_test_steps_per_second': 0.541, 'epoch': 1055.56}
{'loss': 0.0058, 'learning_rate': 1.25e-06, 'epoch': 1083.33}
{'eval_test_loss': 1.5154839754104614, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.5456, 'eval_test_samples_per_second': 220.175, 'eval_test_steps_per_second': 0.541, 'epoch': 1083.33}
{'loss': 0.0058, 'learning_rate': 0.0, 'epoch': 1111.11}
{'eval_test_loss': 1.5173912048339844, 'eval_test_accuracy': 0.601965601965602, 'eval_test_runtime': 5.5573, 'eval_test_samples_per_second': 219.711, 'eval_test_steps_per_second': 0.54, 'epoch': 1111.11}
{'train_runtime': 33468.0187, 'train_samples_per_second': 305.964, 'train_steps_per_second': 0.299, 'train_loss': 0.2646430722594261, 'epoch': 1111.11}
The text was updated successfully, but these errors were encountered:
Could you please report numerical results of the experiments? I conduct the standard finetuning on 8*3090s with:
python run.py --from_pretrained google/t5-v1_1-base --dataset cqa --model_type standard --label_type gt --batch_size 64 --grad_steps 2
I only got an accuarcy of 60.2% on CQA with the last epoch. But It seems to be around 63% reported in the paper.
Here is my training log:
The text was updated successfully, but these errors were encountered: