Abstract:
In recent years, machine learning has become the de-facto standard for different human and computer tasks, spanning from pattern recognition, language understanding, detection of cyber-threats and many more disciplines. Although these models often provide the best results in the field, it has been shown that inputs formed by applying small, but deliberately worst-case perturbations, lead to the model outputting an incorrect answer. One way of crafting these minimally perturbed adversarial examples is by using gradient-based optimization algorithms coupled with the use distance metrics (e.g., lp norms) enforcing sparsity in the optimal solution. It has been discovered that the best metric for this purpose is the l0 norm, for which however optimization is NP-hard. In this work we try to bridge this gap and show that an approximation of the l0 norm can be exploited to craft powerful adversarial examples with minimal perturbations. We empirically demonstrate the effectiveness and suitability of the resulting attacks on two cutting-edge deep neural networks (i.e., ResNet18 and VGG16) trained on two different vision datasets (i.e., CIFAR10 and GTSRB). Finally, we compare the results of our attack with the state of the art by demonstrating that our attack offers a good trade-off between attack speed and effectiveness.