Innovative Binarization Solutions for Historical Document Clarity

Images of historical documents often have characteristics, such as wrinkles, faint writing, stains, bleed-through ink, and other issues. These factors distort the text visibility and affect the performance of binarization. Preserving these document images aids future generations in learning about a...

Full description

Saved in:
Bibliographic Details
Published in2024 4th International Conference on Pervasive Computing and Social Networking (ICPCSN) pp. 210 - 217
Main Authors Kulkarni, Radhika V., Mude, Vedant, Nagrale, Rutuj, Nirgude, Aarya, Nirmal, Tejashri
Format Conference Proceeding
LanguageEnglish
Published IEEE 03.05.2024
Subjects
Online AccessGet full text
DOI10.1109/ICPCSN62568.2024.00043

Cover

More Information
Summary:Images of historical documents often have characteristics, such as wrinkles, faint writing, stains, bleed-through ink, and other issues. These factors distort the text visibility and affect the performance of binarization. Preserving these document images aids future generations in learning about a variety of subjects. This article presents a new binarization approach for historic documents. This work uses bilateral and unsharp filtering, Otsu thresholding, histogram analysis, and k-means clustering for dark spot removal as part of a multi-step image enhancement process. Applying an intensity-based mask raises the quality of pixels above a set threshold. Furthermore, the method includes two additional refinements: a concluding sharpening phase and an enhancement of color contrast. The performance of proposed binarization approach is assessed using the Flesch Reading Ease Formula. The findings reveal that the algorithm achieved its highest readability score when applied to degraded documents, with an average readability score of 96.44. This suggests the algorithm's efficacy in enhancing the readability of noisy images, particularly in the context of degraded documents.
DOI:10.1109/ICPCSN62568.2024.00043