AutoAttack Targeted Attack Issues #2206

ClarktheDarkShark · 2023-06-28T19:01:54Z

ClarktheDarkShark
Jun 28, 2023

I noticed 2 issues with AutoAttack for targeted attacks. I believe I have corrected them in my cloned directory.

The first is that the "sample_is_robust" array seems to indicate that the image associated with the index is 'False' once the image is misclassified. This does not take into account targeted attacks, where the 'y' value starts out as different from the model.predict() values. So it instantly ends the attack. This can be fixed by adding a condition for self.targeted...

The next is with this function:
target = check_and_transform_label_format(
targeted_labels[:, i], nb_classes=self.estimator.nb_classes
)
I do not fully understand what this is supposed to do, but it converted all of my one-hot encoded targeted values into targeting the same class for every image. Instead of targeting the classes I passed in, it targets the first index class (e.g. all of the one-hot trays look like [1, 0, 0, 0, ...])

It is certainly possible that I misunderstand some functionality, but just wanted to share what I found.

beat-buesser · 2023-07-04T13:43:49Z

beat-buesser
Jul 4, 2023
Maintainer

Hi @Christopher-d-clark5 I think you might have identified a bug. Did you implement a solution that you could share?

5 replies

ClarktheDarkShark Jul 4, 2023
Author

I do have something that is working. I can provide it, but it may only work for my specific case

beat-buesser Jul 4, 2023
Maintainer

That would be great if you could share the changes and script to run the test. I did some testing with AutoAttack but couldn't reproduce it. After refreshing my memory of the AutoAttack code, I think it might be related to the different definition of the targeted argument for AutoAttackcode. BasicallyAutoAttackruns each internal attack in untargeted and iftargeted=Truealso each possible targeted attack on all still robust (correctly classified) samples. This meansAutoAttackdoes not try to achieve the provided labels iftargeted=True` like the other classes. The difference in success definition between targeted and untargeted attacks is defined in lines

        if attack.targeted:
            samples_misclassified = np.argmax(y_pred_robust_adv, axis=1) == np.argmax(y_robust, axis=1)
        elif not attack.targeted:
            samples_misclassified = np.argmax(y_pred_robust_adv, axis=1) != np.argmax(y_robust, axis=1)

and samples_misclassified is eventually used to update sample_is_robust.

The variable targeted_labels contains all classes except the true class in its rows for each sample. The line

                        target = check_and_transform_label_format(
                            targeted_labels[:, i], nb_classes=self.estimator.nb_classes
                        )

takes the next next non-true label for each sample separately and run a targeted attack towards these non-true class labels in the following call to self._run_attack where argument y=target.

ClarktheDarkShark Jul 4, 2023
Author

I don't fully understand that setup or how to use it for targeted attacks. It was not returning examples classified with the target label I was providing. It sounds like it just runs its own targeting process?

Attached is the code that I modified. I believe the only changes made are lines 165, 193, and 214-219. I do not have a script for testing that is easy to share.

auto_attack.py.zip

beat-buesser Jul 4, 2023
Maintainer

Hi @Christopher-d-clark5 Yes, that's what I meant above with that the argument targeted has a different effect for AutoAttack than for the other evasion attacks in ART. The goal of AutoAttack is to find a perturbation smaller than eps that results in a classification other than the provided true class label by running multiple attacks first in untargeted mode and then in targeted mode once against each class (target) label other than the provided true label. AutoAttack tries all attack modes to find at least one perturbation resulting in mis-classificaiton, but it cannot run a targeted attack to find a perturbation leading to a classification in a specific target class. For the latter you do not need AutoAttack and can directly use an attack like ProjectedGradientDescent in targeted mode.

ClarktheDarkShark Jul 5, 2023
Author

That is interesting. I believe the changes I made allows for using auto attack in the traditional targeted use. It seems to iterate through each attack finding the best examples from each attack that meets my 'y' target labels. I realized that I had a minor error that would prevent the previous version from working. It is corrected in the attached, if you are interested in using it.
auto_attack.py.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoAttack Targeted Attack Issues #2206

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

AutoAttack Targeted Attack Issues #2206

ClarktheDarkShark Jun 28, 2023

Replies: 1 comment · 5 replies

beat-buesser Jul 4, 2023 Maintainer

ClarktheDarkShark Jul 4, 2023 Author

beat-buesser Jul 4, 2023 Maintainer

ClarktheDarkShark Jul 4, 2023 Author

beat-buesser Jul 4, 2023 Maintainer

ClarktheDarkShark Jul 5, 2023 Author

ClarktheDarkShark
Jun 28, 2023

Replies: 1 comment 5 replies

beat-buesser
Jul 4, 2023
Maintainer

ClarktheDarkShark Jul 4, 2023
Author

beat-buesser Jul 4, 2023
Maintainer

ClarktheDarkShark Jul 4, 2023
Author

beat-buesser Jul 4, 2023
Maintainer

ClarktheDarkShark Jul 5, 2023
Author