Online writer identification using statistical modeling-based feature embedding

Writer identification is the task of specifying the genuine writer according to their handwriting across a set of enrolled subjects which is a noteworthy research topic in the community of document analysis and recognition. In this paper, a novel framework based totally on identity vector is introdu...

Full description

Saved in:

Bibliographic Details
Published in	Soft computing (Berlin, Germany) Vol. 25; no. 14; pp. 9639 - 9649
Main Author	BabaAli, Bagher
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2021
Subjects	Application of Soft Computing Artificial Intelligence Computational Intelligence Control Engineering Mathematical Logic and Foundations Mechatronics Robotics Regularized linear discriminant analysis (LDA) Identity vector Within-class covariance Normalization (WCCN) Online writer identification Statistical feature embedding
Online Access	Get full text
ISSN	1432-7643 1433-7479
DOI	10.1007/s00500-021-05729-x

Cover

More Information
Summary:	Writer identification is the task of specifying the genuine writer according to their handwriting across a set of enrolled subjects which is a noteworthy research topic in the community of document analysis and recognition. In this paper, a novel framework based totally on identity vector is introduced for the online writer identification task. In the proposed framework, the sequence of extracted feature vectors from each handwriting sample is embedded into a fixed-length vector, referred to as identity vector (i-vector), to capture the long-term sequence-level writer-related characteristics, and then passed to the next stage for classification. Several techniques for feature normalization and intra-class variation reduction techniques in the i-vector domain such as within-class covariance normalization and regularized linear discriminant analysis are also investigated. We extensively evaluate the introduced framework on the popular database, CAISA, for English and Chinese language in various scenarios, such as multi-language and cross-language. Experimental results show, in the best cases, the proposed framework could achieve 98.68% accuracy on English dataset and 96.03% on Chinese dataset of the CAISA database. These obtained results indicate an improvement over the best reported result of the current state-of-the-art approaches with the exception of fully end-to-end approaches which have their own serious limitation in the real applications. In addition to the accuracy improvement, due to its low computational load it has the potential to be implemented on the handheld digital devices.
ISSN:	1432-7643 1433-7479
DOI:	10.1007/s00500-021-05729-x