A Data-centric Approach to Class-specific Bias in Image Data Augmentation: Conclusion and Limitation

cover
31 Aug 2024

Authors:

(1) Athanasios Angelakis, Amsterdam University Medical Center, University of Amsterdam - Data Science Center, Amsterdam Public Health Research Institute, Amsterdam, Netherlands

(2) Andrey Rass, Den Haag, Netherlands.

3 Conclusion and Limitations

This study extends the analysis initiated by Balestriero, Bottou, and LeCun (2022), focusing on the impact of data augmentations, particularly Random Crop, on class-specific bias in image classification models. Our contributions are multi-faceted, addressing the need for a more nuanced understanding of DA’s effects across different contexts.

We empirically demonstrate that DA-induced class-specific biases are not exclusive to ImageNet but also affect datasets with distinct characteristics, such as Fashion-MNIST and CIFAR. These datasets, featuring significantly fewer and smaller-sized images, some of which are monochrome, provide a broader canvas to assess DA’s impact. This variation in dataset characteristics allowed us to explore how DA-induced biases manifest in environments markedly different from ImageNet, thus broadening the scope of understanding regarding DA’s implications.

By incorporating additional deep neural network architectures like EfficientNetV2S (a residual model) and SWIN Vision Transformer (a non-residual, patch-based model), we delve into the model-agnostic proposition of class-specific DA-induced bias. Our findings reveal that while the phenomenon extends to residual models, alternative architectures such as Vision Transformers exhibit a varying degree of robustness or altered dynamics in response to DA, suggesting a potential strategy for mitigating class-specific biases through architectural selection.

We offer a detailed methodology for ”data augmentation robustness scouting,” refining the initial concept proposed by Balestriero, Bottou, and LeCun (2022). This step-by-step approach aims at a more efficient, resource-sensitive examination of DA’s effects, facilitating the identification and mitigation of class-specific biases in the model design phase. By applying this methodology, we not only validate previous findings but also present a practical framework for future studies and model development efforts.

Our study, while highlighting the aforementioned contributions, acknowledges its scope limitations concerning the variety of architectures and datasets examined. Future work is encouraged to explore a broader array of computer vision models and data characteristics, potentially unveiling novel insights into DA’s nuanced effects on model performance and bias. This endeavor aims not only to deepen our understanding of DA’s impact across different settings but also to contribute to the development of more equitable and effective computer vision systems.

References

Balestriero, Randall, Leon Bottou, and Yann LeCun. 2022. The effects of regularization and data augmentation are class dependent. In Advances in Neural Information Processing Systems, volume 35, pages 37878–37891, Curran Associates, Inc.

Bishop, Christopher M and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning, volume 4. Springer.

Cimpoi, M., S. Maji, I. Kokkinos, S. Mohamed, , and A. Vedaldi. 2014. Describing textures in the wild. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR).

Cui, Xiaodong, Vaibhava Goel, and Brian Kingsbury. 2015. Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans. Audio, Speech and Lang. Proc., 23(9):1469–1477.

Deng, Jia, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255, Ieee.

Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020a. An image is worth 16x16 words: Transformers for image recognition at scale. CoRR, abs/2010.11929.

Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020b. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. 2019. Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1):1997–2017.

Feng, Xin, Youni Jiang, Xuejiao Yang, Ming Du, and Xin Li. 2019. Computer vision algorithms and hardware implementations: A survey. Integration, 69:309–320.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016a. Deep Learning, chapter 5. MIT Press. http://www.deeplearningbook.org.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016b. Deep Learning, chapter 9. MIT Press. http://www.deeplearningbook.org.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016c. Deep Learning, chapter 6. MIT Press. http://www.deeplearningbook.org.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016d. Deep Learning, chapter 7. MIT Press. http://www.deeplearningbook.org.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning. Image Recognition, 7.

Huang, Gao, Zhuang Liu, and Kilian Q. Weinberger. 2016. Densely connected convolutional networks. CoRR, abs/1608.06993.

Kingma, Diederik P and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Ko, Tom, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. 2015. Audio augmentation for speech recognition. In Sixteenth annual conference of the international speech communication association.

Krizhevsky, Alex, Geoffrey Hinton, et al. 2009. Learning multiple layers of features from tiny images.

Kwabena Patrick, Mensah, Adebayo Felix Adekoya, Ayidzoe Abra Mighty, and Baagyire Y. Edward. 2022. Capsule networks – a survey. Journal of King Saud University - Computer and Information Sciences, 34(1):1295–1310.

LeCun, Yann, Yoshua Bengio, et al. 1995. Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10):1995.

LeCun, Yann, L´eon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.

LeCun, Yann, Koray Kavukcuoglu, and Cl´ement Farabet. 2010. Convolutional networks and applications in vision. In Proceedings of 2010 IEEE international symposium on circuits and systems, pages 253–256, IEEE.

Liu, Ze, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022.

Loshchilov, Ilya and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.

Mehta, Sachin and Mohammad Rastegari. 2021. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. CoRR, abs/2110.02178.

Sabour, Sara, Nicholas Frosst, and Geoffrey E Hinton. 2017. Dynamic routing between capsules. Advances in neural information processing systems, 30.

Shalev-Shwartz, Shai and Shai Ben-David. 2014. Understanding machine learning: From theory to algorithms. Cambridge university press.

Shorten, Connor and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data, 6(1):1–48.

Tan, Mingxing and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114, PMLR.

Tan, Mingxing and Quoc V. Le. 2021. Efficientnetv2: Smaller models and faster training. CoRR, abs/2104.00298.

Taylor, Luke and Geoff Nitschke. 2018. Improving deep learning with generic data augmentation. In 2018 IEEE symposium series on computational intelligence (SSCI), pages 1542–1547, IEEE.

Tihonov, Andrei Nikolajevits. 1963. Solution of incorrectly formulated problems and the regularization method. Soviet Math., 4:1035–1038.

Tikhonov, Andrey Nikolayevich. 1943. On the stability of inverse problems. In Dokl. Akad. Nauk SSSR, volume 39, pages 195–198.

Voulodimos, Athanasios, Nikolaos Doulamis, Anastasios Doulamis, Eftychios Protopapadakis, et al. 2018. Deep learning for computer vision: A brief review. Computational intelligence and neuroscience, 2018.

Xiao, Han, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747.

Zhai, Xiaohua, Alexander Kolesnikov, Neil Houlsby, and Lucas Beyer. 2022. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12104–12113.

This paper is available on arxiv under CC BY 4.0 DEED license.