A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Conclusion and Limitation

31 Aug 2024


(1) Athanasios Angelakis, Amsterdam University Medical Center, University of Amsterdam - Data Science Center, Amsterdam Public Health Research Institute, Amsterdam, Netherlands

(2) Andrey Rass, Den Haag, Netherlands.

3 Conclusion and Limitations

This study extends the analysis initiated by Balestriero, Bottou, and LeCun (2022), focusing on the impact of data augmentations, particularly Random Crop, on class-specific bias in image classification models. Our contributions are multi-faceted, addressing the need for a more nuanced understanding of DA’s effects across different contexts.

We empirically demonstrate that DA-induced class-specific biases are not exclusive to ImageNet but also affect datasets with distinct characteristics, such as Fashion-MNIST and CIFAR. These datasets, featuring significantly fewer and smaller-sized images, some of which are monochrome, provide a broader canvas to assess DA’s impact. This variation in dataset characteristics allowed us to explore how DA-induced biases manifest in environments markedly different from ImageNet, thus broadening the scope of understanding regarding DA’s implications.

By incorporating additional deep neural network architectures like EfficientNetV2S (a residual model) and SWIN Vision Transformer (a non-residual, patch-based model), we delve into the model-agnostic proposition of class-specific DA-induced bias. Our findings reveal that while the phenomenon extends to residual models, alternative architectures such as Vision Transformers exhibit a varying degree of robustness or altered dynamics in response to DA, suggesting a potential strategy for mitigating class-specific biases through architectural selection.

We offer a detailed methodology for ”data augmentation robustness scouting,” refining the initial concept proposed by Balestriero, Bottou, and LeCun (2022). This step-by-step approach aims at a more efficient, resource-sensitive examination of DA’s effects, facilitating the identification and mitigation of class-specific biases in the model design phase. By applying this methodology, we not only validate previous findings but also present a practical framework for future studies and model development efforts.

Our study, while highlighting the aforementioned contributions, acknowledges its scope limitations concerning the variety of architectures and datasets examined. Future work is encouraged to explore a broader array of computer vision models and data characteristics, potentially unveiling novel insights into DA’s nuanced effects on model performance and bias. This endeavor aims not only to deepen our understanding of DA’s impact across different settings but also to contribute to the development of more equitable and effective computer vision systems.


This paper is available on arxiv under CC BY 4.0 DEED license.