Similarity Matching of Pairs of Text using CACT Algorithm
In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known...
Saved in:
| Published in | International journal of engineering and advanced technology Vol. 8; no. 6; pp. 2296 - 2298 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
30.08.2019
|
| Online Access | Get full text |
| ISSN | 2249-8958 2249-8958 |
| DOI | 10.35940/ijeat.F8685.088619 |
Cover
| Abstract | In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation. |
|---|---|
| AbstractList | In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the short text with several traditional tools of natural language processing and this is not applied correctly either. In short text, it is known that there are rare and insufficient data available and further it is difficult to identify semantic knowledge with the great noise and ambiguity of short texts. In this paper, the authors proposed to replace the coefficient of similarity of Cosine with the measure of similarity of Jaro-Winkler to obtain the coincidence of similarity between pairs of text (source text and target text). Jaro-Winkler does a better job of determining the similarity of the strings because it takes an order into account when using the positional indices to estimate relevance. It is presumed that the performance of CACT driven by Jaro-Wrinkler with respect to one-to-many data links offers optimized performance when compared to the operation of CACT driven by cosine. In this paper, the ensemble algorithm CACTS and SAE is adopted with Jaro-Winkler similarity approach. The new algorithm is employed for short text analysis and better results. An evaluation of our proposed concept is sufficient as validation. |
| Author | Reddy, Dr.K.S. Kumar, Ch. N. Santhosh Kumar, V Pavan |
| Author_xml | – sequence: 1 givenname: Ch. N. Santhosh surname: Kumar fullname: Kumar, Ch. N. Santhosh – sequence: 2 givenname: V Pavan surname: Kumar fullname: Kumar, V Pavan – sequence: 3 givenname: Dr.K.S. surname: Reddy fullname: Reddy, Dr.K.S. |
| BookMark | eNp1kFFPwjAQxxuDiYh8Al_2BYbtdh3tI1lETTCaiM_NbWuhZGykLVG-vYz5ICTey_1zud9d8rslg6ZtNCH3jE5SLoE-2I3GMJmLTPAJFSJj8ooMkwRkLCQXgz_5hoy939BjTXmSUjYk8sNubY3OhkP0iqFc22YVtSZ6R-t8F5b6O0R7343zWb6MZvWqPW6vt3fk2mDt9fi3j8jn_HGZP8eLt6eXfLaIy4RKGWuADAyYSqLJOE-myDFDYBqoYawotAGqBcVKVhkmgFSzIjUCSpYyBhVLRwT6u_tmh4cvrGu1c3aL7qAYVScD6mRAmc6A6g0cMdljpWu9d9qo0gYMtm2CQ1tfsvNzNr1g__l4Rv0A1tp0oQ |
| CitedBy_id | crossref_primary_10_1051_e3sconf_202561603037 crossref_primary_10_22399_ijcesen_535 |
| ContentType | Journal Article |
| CorporateAuthor | Researcher, Anurag Group of Institutions, Hyderabad, India Dept. of CSE, Anurag Engineering College, Kodada, India |
| CorporateAuthor_xml | – name: Researcher, Anurag Group of Institutions, Hyderabad, India – name: Dept. of CSE, Anurag Engineering College, Kodada, India |
| DBID | AAYXX CITATION ADTOC UNPAY |
| DOI | 10.35940/ijeat.F8685.088619 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2249-8958 |
| EndPage | 2298 |
| ExternalDocumentID | 10.35940/ijeat.f8685.088619 10_35940_ijeat_F8685_088619 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION M~E ADTOC UNPAY |
| ID | FETCH-LOGICAL-c2099-e4464f4fd9af65527a5a6a41e40f11bbef40e80ad9d6a24a0e1b3f84c13114d13 |
| IEDL.DBID | UNPAY |
| ISSN | 2249-8958 |
| IngestDate | Sun Sep 07 10:53:12 EDT 2025 Thu Apr 24 22:52:13 EDT 2025 Tue Jul 01 01:36:01 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| License | cc-by-nc-nd |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2099-e4464f4fd9af65527a5a6a41e40f11bbef40e80ad9d6a24a0e1b3f84c13114d13 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.35940/ijeat.f8685.088619 |
| PageCount | 3 |
| ParticipantIDs | unpaywall_primary_10_35940_ijeat_f8685_088619 crossref_citationtrail_10_35940_ijeat_F8685_088619 crossref_primary_10_35940_ijeat_F8685_088619 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2019-08-30 |
| PublicationDateYYYYMMDD | 2019-08-30 |
| PublicationDate_xml | – month: 08 year: 2019 text: 2019-08-30 day: 30 |
| PublicationDecade | 2010 |
| PublicationTitle | International journal of engineering and advanced technology |
| PublicationYear | 2019 |
| SSID | ssj0000752301 |
| Score | 2.2184691 |
| Snippet | In data mining, shorter text analysis is performed more widely for many applications. Based on the syntax of the language, it is very difficult to analyze the... |
| SourceID | unpaywall crossref |
| SourceType | Open Access Repository Enrichment Source Index Database |
| StartPage | 2296 |
| Title | Similarity Matching of Pairs of Text using CACT Algorithm |
| URI | https://doi.org/10.35940/ijeat.f8685.088619 |
| UnpaywallVersion | publishedVersion |
| Volume | 8 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2249-8958 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000752301 issn: 2249-8958 databaseCode: M~E dateStart: 20110101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8UDnoRPyN-kB482tlt3VyPC4EQEwiJkOBp6boWURhEIUYP_u2-boOgCUZvPfyaNK-v76Pt-z2ErjQojWSCEiehtwQ8HiVCxZJAOiS0NgzrialGbnf8Vp_dDbxBwbNtamHW3u9djzN6M3oCm2TpwA88C06Ebyg-y74HgXcJlfudbvhg2sdBEkEC7gU5r9Cmmd98z84inYn3NzEerzmUZiWv1H7NeAjNP5JnazGPLfnxg6Xxj2vdR3tFYInDXBMO0JZKD1Fl2bQBF2f4CPH70WQE6SxE37gNdtjcQOGpxl3zsGMGPTDX2HyHH-J6WO_hcDycAvpxcoz6zUav3iJF_wQiTUEsUZDqMc10woX2DdOa8IQvmK0Y1bYdx0ozqgIqEp74woENU3bs6oBJQ8HDEts9QaV0mqpThGMncZkjKU-EYJrK2Fa-bUsulFYeVU4VOUvJRrIgFzc9LsYRJBmZcKJMOFHTCCfKhVNF16tJs5xb43c4WW3ZJrxew5_9E3-OdiEa4tmFMb1ApfnLQl1CxDGPa2i7_dmoFfr2BXnz0Wo |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA-yHfTi_MT5RQ4eTU26NLbHMhxD2Bi4wTyVNE3mtOuGboj-9b603ZjCRG85_ALh5eV9JHm_h9CVAaVRXFLiJvSWgMejROpYEUiHpDGWYT2x1cidrmgP-P3QG5Y827YWZu39vuEFnN6Mn8EmOcYXvufAiRCW4rMqPAi8K6g66PbCR9s-DpII4geeX_AKbZr5zfdsL7KZ_HiXabrmUFq1olL7LechtP9IXpzFPHbU5w-Wxj-udQ_tloElDgtN2EdbOjtAtWXTBlye4UMUPIwnY0hnIfrGHbDD9gYKTw3u2YcdO-iDucb2O_wIN8NmH4fpaArop8kRGrTu-s02KfsnEGULYomGVI8bbpJAGmGZ1qQnheRMc2oYi2NtONU-lUmQCOnChmkWN4zPlaXg4QlrHKNKNs30CcKxmzS4q2iQSMkNVTHTgjEVSG20R7VbR-5SspEqycVtj4s0giQjF06UCydqWeFEhXDq6Ho1aVZwa_wOJ6st24Q3a_jTf-LP0A5EQ0F-YUzPUWX-utAXEHHM48tS074AlgLQOQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Similarity+Matching+of+Pairs+of+Text+using+CACT+Algorithm&rft.jtitle=International+journal+of+engineering+and+advanced+technology&rft.au=Kumar%2C+Ch.+N.+Santhosh&rft.au=Kumar%2C+V+Pavan&rft.au=Reddy%2C+Dr.K.S.&rft.date=2019-08-30&rft.issn=2249-8958&rft.eissn=2249-8958&rft.volume=8&rft.issue=6&rft.spage=2296&rft.epage=2298&rft_id=info:doi/10.35940%2Fijeat.F8685.088619&rft.externalDBID=n%2Fa&rft.externalDocID=10_35940_ijeat_F8685_088619 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2249-8958&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2249-8958&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2249-8958&client=summon |