Abstract:
Machine learning has become increasingly popular for its ability to learn from data, identify patterns and make logical decisions with little or no human intervention, allowing humans to rapidly develop models that can analyze extraordinarily large and ever-increasing volumes of data. Machine learning models, for instance, Convolution Neural Networks (CNNs), received attention due their purposeful use in a wide variety of areas, such as self-driving cars and cyber security. However, recent studies have shed light on how such systems can be compromised by test time evasion attacks, i.e., carefully engineered adversarial examples with imperceptible perturbation, raising security concerns about using such models in safety-critical systems. Furthermore, adversarial examples may exhibit the transferability property, i.e., adversarial examples crafted for one model may evade also potentially unknown models, that makes attacks practical even in the black-box setting. Machine learning models need to present satisfiable performance also in adversarial settings, thus it’s crucial to evaluate faithfully their robustness against evasion attacks. Since in real world scenarios (black-box settings) target models may not be directly accessible and it may be difficult to verify their robustness, we propose a framework that allow the analyst to evaluate efficiently the robustness of target models by leveraging simple well-known surrogate models and the transferability of adversarial attacks. Our proposal consists in combining the information about the robustness of surrogate models evaluated on a test set using different logical gates to approximate the robustness of the target model, hoping that the information about the robustness of surrogate models transfer to the target model. In addition, along with the measure of transferability for each model, we explore the correlation between other information available to the analyst and the best gate, in order to suggest a strategy to identify the best aggregation function in different settings. The preliminary experimental evaluation on MNIST dataset using different machine learning models shows the possibility of approximating effectively the robustness of target models via surrogate models.