A ConvNeXt V2 Approach to Document Image Analysis: Enhancing High-Accuracy Classification
Proper classification of documents is of tremendous importance for an organization. As digital copies of documents are now available due to technological advances, it has become convenient to classify them automatically using machine learning or deep learning algorithms. Deep CNNs have been widely a...
Saved in:
Published in | 2024 IEEE 3rd Conference on Information Technology and Data Science (CITDS) pp. 1 - 6 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
26.08.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/CITDS62610.2024.10791343 |
Cover
Summary: | Proper classification of documents is of tremendous importance for an organization. As digital copies of documents are now available due to technological advances, it has become convenient to classify them automatically using machine learning or deep learning algorithms. Deep CNNs have been widely applied for document image classification, whereas newly intro-duced transformer-based models have also presented favorable outcomes. In this paper, one of the advanced and very high-performing deep CNN models, ConvNext V2, developed very recently, has been adapted for document image classification task, that leverages the imitation of the self-attention mechanism of transformers through the use of masked autoencoders. Prior studies have suggested that models pre-trained on ImageNet may not perform optimally when directly applied to document classification tasks. Nevertheless, our findings reveal substantial enhancements in performance using ConvNext V2, indicating that further domain-specific pre-training (for instance, on the RVL-CDIP dataset) might not be essential for attaining high accuracy. Our results demonstrate that effective, direct application of ImageNet pre-trained models can yield significant benefits. The model has been applied to one of the most standard document image classification datasets Tobacco-3482. The results show a high overall accuracy of 92.25 % with fast convergence, outper-forming several other state-of-the-art methods. The source code for this work can be accessed by using the following link: https://github.com/MdSaifuIIslamSajol/document-tobacco-convnextv2 |
---|---|
DOI: | 10.1109/CITDS62610.2024.10791343 |