This is an experiment I did in order understand and familiarize myself with adversarial attacks in machine learning. It is loosely inspired by the work done in Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images.
I decided to use evolutionary algorithms (Evolutionary Strategies and Simple Genetic Algorithms) to perform the experiment. I used Pygmo 2.7. While it is an easy to use framework, I really don't like the parallelization toolbox in it (to be addressed later)
I present here my experimental setup and the results I obtained.
DISCLAIMER: A number of the following visualizations/explanations in this section are borrorwed from the presentation "Security and Privacy in Machine Learning", by Nicolas Papernot, Google Brain.
I will define an adversarial example attack in machine learning as introducing data examples that exploit the model limited knowledge about reality. The introduction of those examples can compromise the integrity of the predictions with respect to the expected outcome, and/or compromise the ability to deploy the system in real-life.
Machine learning is usually a part of a larger system.
Machine learning systems tries to approximate the real distribution based on the data samples provided. This is where the adversarial examples comes in.
Real distribution: Approximated distribution: Test data: Adversarial examples: