Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computation time #30

Open
sungyoon-lee opened this issue Jan 28, 2020 · 7 comments
Open

Computation time #30

sungyoon-lee opened this issue Jan 28, 2020 · 7 comments

Comments

@sungyoon-lee
Copy link

sungyoon-lee commented Jan 28, 2020

Hi,
I've run mnist.py on a single Titan X (Pascal) with the default settings.
However, the speed is much slower(x3) than that reported in the literature (Table 1).
Scaling provable adversarial defenses
My attempt(=0.19*1200=230s/epoch) vs Report(=74s/epoch)
image

I think the only difference is that I'm using pytorch 1.4.0 and I've changed the code dual_layers.py (not using 'view' but using 'reshape').

@riceric22
Copy link
Member

Hi Sungyoon,

I don't currently have access to a Titan X to verify this exactly, but are you running the script with exact bound computation? The numbers in the paper reflect the use of random Cauchy projections described in section 3.2 (I believe with 50 random projections). Running the exact bound computation will of course be slower.

~Eric

@sungyoon-lee
Copy link
Author

sungyoon-lee commented Jan 29, 2020

@riceric22
Thank you for the quick response. I've run like followings:
server:~/convex_adversarial$ python examples/mnist.py
Also, I've tried with the argument, proj=50
server:~/convex_adversarial$ python examples/mnist.py --proj 50
But this also has a similar speed (=0.18x1200=216s/epoch).
image
I think it is slow because I use a single GPU instead of 4. When I tried with
server:~/convex_adversarial$ python examples/mnist.py --proj 50 --cuda_ids 0,1,2,3
This has a similar speed with that reported in the paper (=0.08x1200=96s/epoch).
image
Moreover, I can't run cifar.py with the default setting because of the memory error, so I have to use the argument cuda_ids=0,1,2,3. But I couldn't run cifar.py for the 'large' network with 4 GPUs, or even with 8 GPUs.

@riceric22
Copy link
Member

riceric22 commented Jan 29, 2020

Hi Sungyoon,

In addition to adding --proj 50 you also need to specify --norm_train l1_median and --norm_test l1_median to use the median estimator for random projections during training and testing, otherwise it will still compute the exact bound (this is why you see the same speed). I realize this wasn't well documented in the code, thanks for bringing this up. MNIST definitely doesn't need more than one GPU, and also note that for MNIST it's possible to use even fewer random projections (e.g. 10) and still get comparable results.

Computing exact bounds on CIFAR10 does however run out of memory, due to the increased input size. It is not possible in my experience to run the exact bound on more than one example at a time; as a result, during training make sure you use the random projections to get the speeds reported in the paper meant for scaling these approaches.

~Eric

@sungyoon-lee
Copy link
Author

@riceric22
Thank you! The code is now running fast, even faster than that reported in the paper (=0.03x1200=36s/epoch).
server:~/convex_adversarial$ python examples/mnist.py --proj 50 --norm_train l1_median --norm_test l1_median
However, it causes an error with nan loss (3 trials). And I think it is faster because of the error.
image
Also, there is the same nan loss error for CIFAR-10.
image

@riceric22
Copy link
Member

It seems that somewhere after PyTorch 1.0, there was an underlying change in PyTorch which introduced NaNs into the projection code, as I'm able to run training normally without NaNs in my PyTorch 1.0 environment but I can reproduce the NaNs in my PyTorch 1.2 environment.

I'll take a look and try to narrow down what happened here, but you should be able to run this normally with PyTorch 1.0

@sungyoon-lee
Copy link
Author

Thank you very much, Eric. I tried it on Pytorch 1.0.0 environment, and it works with no error!

@pdebartol
Copy link

Did anyone manage to reproduce the cifar experiments in a more recent PyTorch environment (>=1.4.0) without getting NaNs with projections?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants