A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions
Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and ed...
Saved in:
| Published in | ACM computing surveys |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
09.10.2025
|
| Online Access | Get full text |
| ISSN | 0360-0300 1557-7341 |
| DOI | 10.1145/3771273 |
Cover
| Summary: | Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment. |
|---|---|
| ISSN: | 0360-0300 1557-7341 |
| DOI: | 10.1145/3771273 |