A ConvNeXt V2 Approach to Document Image Analysis: Enhancing High-Accuracy Classification

Proper classification of documents is of tremendous importance for an organization. As digital copies of documents are now available due to technological advances, it has become convenient to classify them automatically using machine learning or deep learning algorithms. Deep CNNs have been widely a...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE 3rd Conference on Information Technology and Data Science (CITDS) pp. 1 - 6
Main Authors	Islam Sajol, Md Saiful, M Jahid Hasan, A S, Islam, Md Shazid, Rahman, Md Saydur
Format	Conference Proceeding
Language	English
Published	IEEE 26.08.2024
Subjects	Accuracy Adaptation models ConvNeXt V2 convolutional neural networks deep learning Document Image Classification Image classification Information technology Machine learning algorithms masked autoencoders Organizations Source coding Standards organizations Text analysis Transformers
Online Access	Get full text
DOI	10.1109/CITDS62610.2024.10791343

Cover

More Information
Summary:	Proper classification of documents is of tremendous importance for an organization. As digital copies of documents are now available due to technological advances, it has become convenient to classify them automatically using machine learning or deep learning algorithms. Deep CNNs have been widely applied for document image classification, whereas newly intro-duced transformer-based models have also presented favorable outcomes. In this paper, one of the advanced and very high-performing deep CNN models, ConvNext V2, developed very recently, has been adapted for document image classification task, that leverages the imitation of the self-attention mechanism of transformers through the use of masked autoencoders. Prior studies have suggested that models pre-trained on ImageNet may not perform optimally when directly applied to document classification tasks. Nevertheless, our findings reveal substantial enhancements in performance using ConvNext V2, indicating that further domain-specific pre-training (for instance, on the RVL-CDIP dataset) might not be essential for attaining high accuracy. Our results demonstrate that effective, direct application of ImageNet pre-trained models can yield significant benefits. The model has been applied to one of the most standard document image classification datasets Tobacco-3482. The results show a high overall accuracy of 92.25 % with fast convergence, outper-forming several other state-of-the-art methods. The source code for this work can be accessed by using the following link: https://github.com/MdSaifuIIslamSajol/document-tobacco-convnextv2
DOI:	10.1109/CITDS62610.2024.10791343