-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about the performances. #1
Comments
I am also curious about the training performance. BTW, I need to run the training many times with different hyper-parameters, and running 300 epochs takes days even with four titan X. Did you guys tried use less epochs and other learning rate schedule? Please let me know if you have any suggestions. Thank you. |
@D-X-Y On CIFAR-10 it reaches 96.44%, and on CIFAR-100 81.62%. However, I am not keeping the random seed each run, so it sometimes achieves better than the baseline, and sometimes worse. As for what would be causing a difference of performance, I talked with the author of the original paper, and he told me (he was right) that since I was using batch_size = 128 instead of 256, the lr should be divided by two. I have checked your code and I see not much difference with mine, so could it be just a matter of finding the correct random seed? Is the initialization of the weights exactly the same as in their code? |
@wangdelp Using a single TITANX it takes me roughly one day on CIFAR. Which is your batch size and learning rate? |
@prlz77 Thanks for your responses. The initialization is the same and I only train on CIFAT-10 once, so maybe the average performance will be better. There are two versions of the ResNeXt paper, they change the batchsize for CIFAR from 256 to 128 in the Version2.0. |
@D-X-Y Since the performance in CIFAR-10 is correct, it is difficult to guess what is happening on CIFAR-100. Some possibilities are:
|
btw, take into account that the results I am providing are for the small net! (cardinality 8, widen factor 4) So it gets 0.1 better on CIFAR10 and 0.6 worse on CIFAR100. When I have some time, I will provide multi-run results to see if it is always like this. |
@prlz77 I was using batch size 64 since I want to reduce the memory consumption, and distributed among 4 gpus. I am using the default learning rate 0.1 and decay at [0.5, 0.75] * args.epochs, run it with 300 epochs. It sounds like I need two days to complete training on cifar100. Maybe it's due to other lab members are also using GPUs. Using batchsize 256 would lead to Out of Memory on 12GB GPU. Maybe I should try use 128 batchsize on two gpus. |
@wangdelp in my experience, bs=128 distributed on two 1080TI takes about one day. bs=128 on only one gpu takes a little bit more. bs=64 takes almost double the time for the same 300epochs. I would suggest you to use bs=128 (note that if ngpu=4, you will be loading 128/4 for gpu, which is a small amount of memory). If GPUs are already in use, that could be causing a performance issue, as you say. Although it is improvable, check that data is not the issue, for instance increase the number of prefetching threads. |
@prlz77 Thank you. Should I use initial lr 0.05 when batchsize=128, and lr 0.025 when batchsize=64? |
@wangdelp Exact! |
@Queequeg92 I think it is the median of the best test error during training. |
@prlz77 I agree with you since models are likely to be overfitting at the end of training process. I have sent emails to some authors to confirm. |
@prlz77 I think Part D of this paper gives the answer. |
@D-X-Y @prlz77 I'm faced with the same problem when reproducing the performance of DenseNet-40 on CIFAR100. With the exactly same configuration, the acc of PyTorch version is often 1 point lower than Torch version. I don't think it is caused by random seeds. However, after digging into the implementation details of the two frameworks, I find no differences. I am so confused... |
In the past I've noticed up to 1% difference just by using cudnn fastest options due to noise introduced by numerical imprecisions. |
@prlz77 I set |
@wandering007 maybe with cudnn.deterministic = False you get better results. |
@prlz77 No improvements from my experiments. Thank you anyway. |
@wandering007 I'm sorry to hear that, I found this behaviour some years ago, maybe the library has changed or noise is not that important in this model. |
@wandering007 I'm also confused about the differences between two CIFAR datasets. |
@boluoweifenda I haven't train it via tensorflow. There are a lot of ways to improve performance if you don't care about the fair comparison, like using dropout, a better lr schedule, better data augmentation. Personally, 1% performance difference between two frameworks is acceptable. BTW,same settings for different frameworks are not very fair itself :-) |
@wandering007 Thanks for your reply~ But I just care about the fair comparison. Maybe I need to dig deeply to find the differences between frameworks. However, I got the same accuracy on CIFAR10 using tensorflow. It's quite strange for the accuracy drop on CIFAR100. |
Hi,
May I ask your final performance, the curves are a little confusing.
I also implement a different version (https://github.com/D-X-Y/ResNeXt), my results are a little bit lower than the official code, about 0.2 for cifar10 and 1.0 for cifar100.
I really want to what causes the differences.
And I also try training resnet20,32,44,56 , I'm pretty sure the model archieteture is the same as the official code but even obtain a much lower accuracy.
Would you mind to give me some suggestions?
The text was updated successfully, but these errors were encountered: