A Comprehensive Survey of Transformers in Text Recognition: Techniques, Challenges, and Future Directions

Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and ed...

Full description

Saved in:
Bibliographic Details
Published inACM computing surveys
Main Authors Afkari-Fahandari, Ali, Shabaninia, Elham, Asadi-Zeydabadi, Fatemeh, Nezamabadi-Pour, Hossein
Format Journal Article
LanguageEnglish
Published 09.10.2025
Online AccessGet full text
ISSN0360-0300
1557-7341
DOI10.1145/3771273

Cover

More Information
Summary:Optical character recognition is a rapidly evolving field within pattern recognition, enabling the automatic conversion of printed or handwritten text images into machine-readable formats. This technology plays a critical role across various sectors, including banking, healthcare, government, and education. While Optical character recognition systems encompass multiple stages such as text detection, segmentation, and post-processing, this paper focuses on text recognition as a core and technically challenging component. In particular, we provide an in-depth review of recent advances driven by Transformer-based models, which have significantly pushed the state-of-the-art. To contextualize these advancements, a detailed comparative analysis of Transformer-based techniques is presented against earlier deep learning approaches, highlighting their respective limitations and the improvements introduced by Transformers, including parallel sequence processing, global context modeling, better handling of long-range dependencies, and enhanced robustness to irregular or noisy text layouts. We also examine widely used benchmark datasets in the literature and provide a detailed discussion of the performance achieved by recent state-of-the-art methods. Finally, this survey outlines open research challenges and potential future directions. It aims to serve as a comprehensive reference for both novice and experienced researchers by summarizing the latest developments in text recognition, including architectures, datasets, evaluation metrics, and practical considerations in model performance trade-offs and deployment.
ISSN:0360-0300
1557-7341
DOI:10.1145/3771273