Similarity Matching of Pairs of Text using CACT Algorithm

In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of engineering and advanced technology Vol. 8; no. 6; pp. 2296 - 2298
Main Authors Kumar, Ch. N. Santhosh, Kumar, V Pavan, Reddy, Dr.K.S.
Format Journal Article
LanguageEnglish
Published 30.08.2019
Online AccessGet full text
ISSN2249-8958
2249-8958
DOI10.35940/ijeat.F8685.088619

Cover

Abstract In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation.
AbstractList In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation.
Author Reddy, Dr.K.S.
Kumar, Ch. N. Santhosh
Kumar, V Pavan
Author_xml – sequence: 1
  givenname: Ch. N. Santhosh
  surname: Kumar
  fullname: Kumar, Ch. N. Santhosh
– sequence: 2
  givenname: V Pavan
  surname: Kumar
  fullname: Kumar, V Pavan
– sequence: 3
  givenname: Dr.K.S.
  surname: Reddy
  fullname: Reddy, Dr.K.S.
BookMark eNp1kFFPwjAQxxuDiYh8Al_2BYbtdh3tI1lETTCaiM_NbWuhZGykLVG-vYz5ICTey_1zud9d8rslg6ZtNCH3jE5SLoE-2I3GMJmLTPAJFSJj8ooMkwRkLCQXgz_5hoy939BjTXmSUjYk8sNubY3OhkP0iqFc22YVtSZ6R-t8F5b6O0R7343zWb6MZvWqPW6vt3fk2mDt9fi3j8jn_HGZP8eLt6eXfLaIy4RKGWuADAyYSqLJOE-myDFDYBqoYawotAGqBcVKVhkmgFSzIjUCSpYyBhVLRwT6u_tmh4cvrGu1c3aL7qAYVScD6mRAmc6A6g0cMdljpWu9d9qo0gYMtm2CQ1tfsvNzNr1g__l4Rv0A1tp0oQ
CitedBy_id crossref_primary_10_1051_e3sconf_202561603037
crossref_primary_10_22399_ijcesen_535
ContentType Journal Article
CorporateAuthor Researcher, Anurag Group of Institutions, Hyderabad, India
Dept. of CSE, Anurag Engineering College, Kodada, India
CorporateAuthor_xml – name: Researcher, Anurag Group of Institutions, Hyderabad, India
– name: Dept. of CSE, Anurag Engineering College, Kodada, India
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.35940/ijeat.F8685.088619
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2249-8958
EndPage 2298
ExternalDocumentID 10.35940/ijeat.f8685.088619
10_35940_ijeat_F8685_088619
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
ADTOC
UNPAY
ID FETCH-LOGICAL-c2099-e4464f4fd9af65527a5a6a41e40f11bbef40e80ad9d6a24a0e1b3f84c13114d13
IEDL.DBID UNPAY
ISSN 2249-8958
IngestDate Sun Sep 07 10:53:12 EDT 2025
Thu Apr 24 22:52:13 EDT 2025
Tue Jul 01 01:36:01 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 6
Language English
License cc-by-nc-nd
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2099-e4464f4fd9af65527a5a6a41e40f11bbef40e80ad9d6a24a0e1b3f84c13114d13
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.35940/ijeat.f8685.088619
PageCount 3
ParticipantIDs unpaywall_primary_10_35940_ijeat_f8685_088619
crossref_citationtrail_10_35940_ijeat_F8685_088619
crossref_primary_10_35940_ijeat_F8685_088619
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2019-08-30
PublicationDateYYYYMMDD 2019-08-30
PublicationDate_xml – month: 08
  year: 2019
  text: 2019-08-30
  day: 30
PublicationDecade 2010
PublicationTitle International journal of engineering and advanced technology
PublicationYear 2019
SSID ssj0000752301
Score 2.2184691
Snippet In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the...
SourceID unpaywall
crossref
SourceType Open Access Repository
Enrichment Source
Index Database
StartPage 2296
Title Similarity Matching of Pairs of Text using CACT Algorithm
URI https://doi.org/10.35940/ijeat.f8685.088619
UnpaywallVersion publishedVersion
Volume 8
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2249-8958
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000752301
  issn: 2249-8958
  databaseCode: M~E
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8UDnoRPyN-kB482tlt3VyPC4EQEwiJkOBp6boWURhEIUYP_u2-boOgCUZvPfyaNK-v76Pt-z2ErjQojWSCEiehtwQ8HiVCxZJAOiS0NgzrialGbnf8Vp_dDbxBwbNtamHW3u9djzN6M3oCm2TpwA88C06Ebyg-y74HgXcJlfudbvhg2sdBEkEC7gU5r9Cmmd98z84inYn3NzEerzmUZiWv1H7NeAjNP5JnazGPLfnxg6Xxj2vdR3tFYInDXBMO0JZKD1Fl2bQBF2f4CPH70WQE6SxE37gNdtjcQOGpxl3zsGMGPTDX2HyHH-J6WO_hcDycAvpxcoz6zUav3iJF_wQiTUEsUZDqMc10woX2DdOa8IQvmK0Y1bYdx0ozqgIqEp74woENU3bs6oBJQ8HDEts9QaV0mqpThGMncZkjKU-EYJrK2Fa-bUsulFYeVU4VOUvJRrIgFzc9LsYRJBmZcKJMOFHTCCfKhVNF16tJs5xb43c4WW3ZJrxew5_9E3-OdiEa4tmFMb1ApfnLQl1CxDGPa2i7_dmoFfr2BXnz0Wo
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA-yHfTi_MT5RQ4eTU26NLbHMhxD2Bi4wTyVNE3mtOuGboj-9b603ZjCRG85_ALh5eV9JHm_h9CVAaVRXFLiJvSWgMejROpYEUiHpDGWYT2x1cidrmgP-P3QG5Y827YWZu39vuEFnN6Mn8EmOcYXvufAiRCW4rMqPAi8K6g66PbCR9s-DpII4geeX_AKbZr5zfdsL7KZ_HiXabrmUFq1olL7LechtP9IXpzFPHbU5w-Wxj-udQ_tloElDgtN2EdbOjtAtWXTBlye4UMUPIwnY0hnIfrGHbDD9gYKTw3u2YcdO-iDucb2O_wIN8NmH4fpaArop8kRGrTu-s02KfsnEGULYomGVI8bbpJAGmGZ1qQnheRMc2oYi2NtONU-lUmQCOnChmkWN4zPlaXg4QlrHKNKNs30CcKxmzS4q2iQSMkNVTHTgjEVSG20R7VbR-5SspEqycVtj4s0giQjF06UCydqWeFEhXDq6Ho1aVZwa_wOJ6st24Q3a_jTf-LP0A5EQ0F-YUzPUWX-utAXEHHM48tS074AlgLQOQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Similarity+Matching+of+Pairs+of+Text+using+CACT+Algorithm&rft.jtitle=International+journal+of+engineering+and+advanced+technology&rft.au=Kumar%2C+Ch.+N.+Santhosh&rft.au=Kumar%2C+V+Pavan&rft.au=Reddy%2C+Dr.K.S.&rft.date=2019-08-30&rft.issn=2249-8958&rft.eissn=2249-8958&rft.volume=8&rft.issue=6&rft.spage=2296&rft.epage=2298&rft_id=info:doi/10.35940%2Fijeat.F8685.088619&rft.externalDBID=n%2Fa&rft.externalDocID=10_35940_ijeat_F8685_088619
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2249-8958&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2249-8958&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2249-8958&client=summon