Similarity Matching of Pairs of Text using CACT Algorithm

In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of engineering and advanced technology Vol. 8; no. 6; pp. 2296 - 2298
Main Authors Kumar, Ch. N. Santhosh, Kumar, V Pavan, Reddy, Dr.K.S.
Format Journal Article
LanguageEnglish
Published 30.08.2019
Online AccessGet full text
ISSN2249-8958
2249-8958
DOI10.35940/ijeat.F8685.088619

Cover

More Information
Summary:In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation.
ISSN:2249-8958
2249-8958
DOI:10.35940/ijeat.F8685.088619