Document Type
Article
Version Deposited
Published Version
Publication Date
1-24-2024
Publication Title
Entropy
DOI
10.3390/e26020103
Abstract
Despite their remarkable performance, deep learning models still lack robustness guarantees, particularly in the presence of adversarial examples. This significant vulnerability raises concerns about their trustworthiness and hinders their deployment in critical domains that require certified levels of robustness. In this paper, we introduce an information geometric framework to establish precise robustness criteria for (Formula presented.) white-box attacks in a multi-class classification setting. We endow the output space with the Fisher information metric and derive criteria on the input–output Jacobian to ensure robustness. We show that model robustness can be achieved by constraining the model to be partially isometric around the training points. We evaluate our approach using MNIST and CIFAR-10 datasets against adversarial attacks, revealing its substantial improvements over defensive distillation and Jacobian regularization for medium-sized perturbations and its superior robustness performance to adversarial training for large perturbations, all while maintaining the desired accuracy.
Recommended Citation
Shi-Garrier, Loïc, Nidhal Carla Bouaynaya, and Daniel Delahaye. 2024. "Adversarial Robustness with Partial Isometry" Entropy 26, no. 2: 103. https://doi.org/10.3390/e26020103
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Comments
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.