The influence of word normalization in English document clustering
Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that t...
        Saved in:
      
    
          | Published in | 2012 IEEE International Conference on Computer Science and Automation Engineering Vol. 2; pp. 116 - 120 | 
|---|---|
| Main Authors | , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.05.2012
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 1467300888 9781467300889  | 
| DOI | 10.1109/CSAE.2012.6272740 | 
Cover
| Abstract | Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity. | 
    
|---|---|
| AbstractList | Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity. | 
    
| Author | Dongbo Wang Yanyun Liu Si Shen Pu Han  | 
    
| Author_xml | – sequence: 1 surname: Pu Han fullname: Pu Han email: hanpu0725@gamil.com organization: Sch. of Inf. Manage., Nanjing Univ., Nanjing, China – sequence: 2 surname: Si Shen fullname: Si Shen email: sszcgfss@gmail.com organization: Sch. of Inf. Manage., Nanjing Univ., Nanjing, China – sequence: 3 surname: Dongbo Wang fullname: Dongbo Wang email: wangdongbo0102@gmail.com organization: Sch. of Inf. Manage., Nanjing Univ., Nanjing, China – sequence: 4 surname: Yanyun Liu fullname: Yanyun Liu email: liuyy208@163.com organization: Inst. of Command Autom., PLA Univ. of Technol. & Sci., Nanjing, China  | 
    
| BookMark | eNo1T8tKxDAUjaigM_YDxE1-oPUmmTbpciwdFQZc2P2Qx81MpE2lD0S_3oLj2RzOgwNnRa5iH5GQewYZY1A-Vu_bOuPAeFZwyeUGLkhSSsU2hRQAqiwuyepfKHVDknH8gAVLZTFuyVNzQhqib2eMFmnv6Vc_OBr7odNt-NFT6OOS0zoe2zCeqOvt3GGcqG3nccIhxOMdufa6HTE585o0u7qpXtL92_Nrtd2noYQpzU1uTM4Zc04DE0wKqQsrHeOATCiwBpzJQeSs9MZLh6ZUXgJqLr33uRVr8vA3GxDx8DmETg_fh_Nr8QtJJU11 | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/CSAE.2012.6272740 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EISBN | 9781467300896 1467300896 9781467300872 146730087X  | 
    
| EndPage | 120 | 
    
| ExternalDocumentID | 6272740 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL  | 
    
| ID | FETCH-LOGICAL-i90t-5b5bb5211dda0131737a6c7d120e1380cb0db503519fbf7deb98f70ea27fff5c3 | 
    
| IEDL.DBID | RIE | 
    
| ISBN | 1467300888 9781467300889  | 
    
| IngestDate | Wed Aug 27 04:36:29 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | false | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i90t-5b5bb5211dda0131737a6c7d120e1380cb0db503519fbf7deb98f70ea27fff5c3 | 
    
| PageCount | 5 | 
    
| ParticipantIDs | ieee_primary_6272740 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2012-May | 
    
| PublicationDateYYYYMMDD | 2012-05-01 | 
    
| PublicationDate_xml | – month: 05 year: 2012 text: 2012-May  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | 2012 IEEE International Conference on Computer Science and Automation Engineering | 
    
| PublicationTitleAbbrev | CSAE | 
    
| PublicationYear | 2012 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0000781088 | 
    
| Score | 1.5713056 | 
    
| Snippet | Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 116 | 
    
| SubjectTerms | Classification algorithms Clustering algorithms Dictionaries document clustering Educational institutions Entropy lemmatization Partitioning algorithms stemming  | 
    
| Title | The influence of word normalization in English document clustering | 
    
| URI | https://ieeexplore.ieee.org/document/6272740 | 
    
| Volume | 2 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ08qm_ibHDzaLmmbJjnq2BjCRHDCbqNJXmA4OpGWgX-9SZtOFA_e0hQeCS_l68t73_cQulUUshQsj6hlvoWZyiJliIqyhGiHFwC8SRfMn_LZa_a4ZMseuttzYQCgKT6D2A-bXL7Z6tpflY3yxKFt5gL0Ay7ylqu1v0_xojXui2m4W7kXYRdCdJJO4VmGrCYlcjR-uZ_4wq4kDkZ_dFdpwGV6hObdstqakre4rlSsP38pNv533cdo-E3jw897gDpBPSgH6MEdDLzuWpPgrcU7F3_i0v-7bgIp073Hgd-LO_NYb2ovquAsDdFiOlmMZ1FopBCtJakipphSDqapMYWX1-EpL3LNDU0I0FQQrYhRzKcUpVWWG1BSWE6gSLi1lun0FPXLbQlnCBdpKgsH8uCMZdrFlkC92iRwC7lkojhHA7_91XsrlbEKO7_4e_oSHXoXtPWDV6hffdRw7TC-UjeNc78Aezuizw | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5jHvSksom_zcGj7dI2aZqjjo2p2xCcsNtokhcYjlakRfCvN-mPieLBW5rCI-GlfH157_seQtcyABqB4V5gmGthJqknNZEeDYmyeAHAq3TBbB5PXujDki076GbLhQGAqvgMfDescvk6V6W7KhvEoUVbagP0HUYpZTVba3uj4mRr7DdTsbdiJ8OeJEkr6tQ8iyavGRAxGD7fjlxpV-g3Zn_0V6ngZbyPZu3C6qqSV78spK8-f2k2_nflB6j_TeTDT1uIOkQdyHrozh4NvG6bk-Dc4A8bgeLM_b1uGlqmfY8bhi9uzWO1KZ2sgrXUR4vxaDGceE0rBW8tSOExyaS0QB1onTqBHR7xNFZcByGBIEqIkkRL5pKKwkjDNUiRGE4gDbkxhqnoCHWzPINjhNMoEqmFebDGqLLRJQRObxK4gViwJD1BPbf91VstlrFqdn769_QV2p0sZtPV9H7-eIb2nDvqasJz1C3eS7iwiF_Iy8rRXzewphw | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+International+Conference+on+Computer+Science+and+Automation+Engineering&rft.atitle=The+influence+of+word+normalization+in+English+document+clustering&rft.au=Pu+Han&rft.au=Si+Shen&rft.au=Dongbo+Wang&rft.au=Yanyun+Liu&rft.date=2012-05-01&rft.pub=IEEE&rft.isbn=9781467300889&rft.volume=2&rft.spage=116&rft.epage=120&rft_id=info:doi/10.1109%2FCSAE.2012.6272740&rft.externalDocID=6272740 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467300889/lc.gif&client=summon&freeimage=true | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467300889/mc.gif&client=summon&freeimage=true | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467300889/sc.gif&client=summon&freeimage=true |