Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering
The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and in...
Saved in:
| Published in | Journal of information systems engineering & management Vol. 10; no. 4s; pp. 134 - 146 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
17.01.2025
|
| Online Access | Get full text |
| ISSN | 2468-4376 2468-4376 |
| DOI | 10.52783/jisem.v10i4s.487 |
Cover
| Abstract | The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research paper plans to perform the document clustering using the nature inspired optimization technique. Initially, the dataset is manually gathered from different sources. Next, the data preparation has been done for extracting the text content from the published documents. These prepared data undergo pre-processing for removing the punctuations, stop words, and lowercase conversion. The features are extracted from these pre-processed data utilizing the Term Frequency-Inverse Document Frequency (TF-IDF) approach for extracting the keywords. The extracted features undergo the final clustering phase employing the spectral clustering algorithm, in which the parameter tuning has been done by the nature inspired optimization algorithm referred as Particle Swarm Optimization (PSO) with the consideration of silhouette score maximization as the objective function. This proposed spectral clustering-PSO clusters the final output into six classes such as data mining, deep learning, image, machine learning, network, and sports respectively. The proposed document clustering model describes its betterment over the remaining techniques with respect to distinct measures. The proposed spectral clustering-PSO in terms of silhouette score is 51.92%, 70.81%, 45.93%, and 20.89% better than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. Similarly, the proposed spectral clustering-PSO in terms of davies bouldin score is 89.69%, 58.48%, 32.67%, and 13.99% advanced than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. |
|---|---|
| AbstractList | The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research paper plans to perform the document clustering using the nature inspired optimization technique. Initially, the dataset is manually gathered from different sources. Next, the data preparation has been done for extracting the text content from the published documents. These prepared data undergo pre-processing for removing the punctuations, stop words, and lowercase conversion. The features are extracted from these pre-processed data utilizing the Term Frequency-Inverse Document Frequency (TF-IDF) approach for extracting the keywords. The extracted features undergo the final clustering phase employing the spectral clustering algorithm, in which the parameter tuning has been done by the nature inspired optimization algorithm referred as Particle Swarm Optimization (PSO) with the consideration of silhouette score maximization as the objective function. This proposed spectral clustering-PSO clusters the final output into six classes such as data mining, deep learning, image, machine learning, network, and sports respectively. The proposed document clustering model describes its betterment over the remaining techniques with respect to distinct measures. The proposed spectral clustering-PSO in terms of silhouette score is 51.92%, 70.81%, 45.93%, and 20.89% better than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. Similarly, the proposed spectral clustering-PSO in terms of davies bouldin score is 89.69%, 58.48%, 32.67%, and 13.99% advanced than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. |
| Author | T. Elavarasi |
| Author_xml | – sequence: 1 surname: T. Elavarasi fullname: T. Elavarasi |
| BookMark | eNqNkMtOAjEYRhuDiYg8gLu-wGDbaWlniXhNSDBB43JSOn-xpHNJ25Hg00vABUtX37c5Z3Gu0aBpG0DolpKJYFLld1sXoZ58U-J4nHAlL9CQ8anKeC6ng7N_hcYxbgkhjHIiOBuiz1UHJgXt8dz3MUFwzSa71xEq_KZDcsYDXu10qPGyS652Pzq5tsEzv2mDS181tm3AD63pa2jSmeMGXVrtI4z_doQ-nh7f5y_ZYvn8Op8tMkOFkFlOjWRGw9pYkisNGgRlhJqCgy6EkjmRUBhuFKytpJUqrK2k1BW1BeVUVPkIsZO3bzq932nvyy64Wod9SUl5rFMe65SnOuWhzgGiJ8iENsYA9h_ML_tCbzE |
| ContentType | Journal Article |
| DBID | AAYXX CITATION ADTOC UNPAY |
| DOI | 10.52783/jisem.v10i4s.487 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 2468-4376 |
| EndPage | 146 |
| ExternalDocumentID | 10.52783/jisem.v10i4s.487 10_52783_jisem_v10i4s_487 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION M~E OK1 ADTOC IPNFZ RIG UNPAY |
| ID | FETCH-LOGICAL-c1557-31c72caebcf038aeae51201c94ea9587307e9c4c8ebf71d89ffd77ad1f91415d3 |
| IEDL.DBID | UNPAY |
| ISSN | 2468-4376 |
| IngestDate | Sun Sep 07 11:16:13 EDT 2025 Tue Jul 01 03:41:14 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 4s |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c1557-31c72caebcf038aeae51201c94ea9587307e9c4c8ebf71d89ffd77ad1f91415d3 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.52783/jisem.v10i4s.487 |
| PageCount | 13 |
| ParticipantIDs | unpaywall_primary_10_52783_jisem_v10i4s_487 crossref_primary_10_52783_jisem_v10i4s_487 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-01-17 |
| PublicationDateYYYYMMDD | 2025-01-17 |
| PublicationDate_xml | – month: 01 year: 2025 text: 2025-01-17 day: 17 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of information systems engineering & management |
| PublicationYear | 2025 |
| SSID | ssj0002140542 |
| Score | 2.28268 |
| Snippet | The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining... |
| SourceID | unpaywall crossref |
| SourceType | Open Access Repository Index Database |
| StartPage | 134 |
| Title | Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering |
| URI | https://doi.org/10.52783/jisem.v10i4s.487 |
| UnpaywallVersion | publishedVersion |
| Volume | 10 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2468-4376 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002140542 issn: 2468-4376 databaseCode: M~E dateStart: 20160101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFJ4IHNSDuxEXMgdPmmKnTGnniARCTEQSJeKpmc6iaFkCVKIHf7tv2mrQmKjXZrrk9U2_r2_5HkLHSjLXUaRq-QAPFtWSW8yXxPJs4buOG1ZkUkRz2a62uvSi5_YysWjTC7OQv3fNEIizx_5UDcrPxO7TaRnYdQ4Vqi7Q7jwqdNud2p0ZHme6hyjslDRr-fN5X3BnOR6O-cucR9ECmDTX0zKsaaJBaGpInsrxLCyL128KjX96zg20llFKXEt9YBMtqeEWWl0QGtxGt2bMvIlp4HoUG2kEOGqdA4BJ3Ml8B1_P-WSAr-ATMsh6M3Etuh9N-rOHAQZqiwGPYhNLXLjGDuo2Gzf1lpWNVLAEEAcTkRSeI7gKhbYrPldcAeDbRDCqOHN92O6eYoIKX4XaI9JnWkvP45JoRgDqZWUX5YejodpDWNiSVjWh0gjkiwplnstsLR1tEp9wmyI6-TB4ME6VMwL440isFSTWClJrBWCtIjr9fCW_r97_1-oDtOKYob02sYh3iPKzSayOgEnMwhLKXb41SpknvQODcMp4 |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFJ4oHNSDuxG3zMGTptgpU9o5IpEQE5FEiXhqprMoSguBVqK_3je0mmpM1GszXfL6pt_Xt3wPoWMlmesoUrd8gAeLaskt5ktiebbwXccNa3JeRHPVqbd79LLv9nOxaNMLU8jfu2YIxNnTYKqi6guxB3RaBXa9iMp1F2h3CZV7nW7j3gyPM91DFHZKlrX8-bwvuLOUxmP-OuPDYQFMWmtZGdZ0rkFoakieq2kSVsXbN4XGPz3nOlrNKSVuZD6wgRZUvIlWCkKDW-jOjJk3MQ3cHKZGGgGOWucAYBJ3c9_BNzM-ifA1fEKivDcTN4YPo8kgeYwwUFsMeJSaWGLhGtuo17q4bbatfKSCJYA4mIik8BzBVSi0XfO54goA3yaCUcWZ68N29xQTVPgq1B6RPtNaeh6XRDMCUC9rO6gUj2K1i7CwJa1rQqURyBc1yjyX2Vo62iQ-4TYVdPJh8GCcKWcE8Mcxt1Ywt1aQWSsAa1XQ6ecr-X313r9W76NlxwzttYlFvANUSiapOgQmkYRHuQ-9A8kryUc |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spectral+Clustering-Based+Particle+Swarm+Optimization+Algorithm+for+Document+Clustering&rft.jtitle=Journal+of+information+systems+engineering+%26+management&rft.au=T.+Elavarasi&rft.date=2025-01-17&rft.issn=2468-4376&rft.eissn=2468-4376&rft.volume=10&rft.issue=4s&rft.spage=134&rft.epage=146&rft_id=info:doi/10.52783%2Fjisem.v10i4s.487&rft.externalDBID=n%2Fa&rft.externalDocID=10_52783_jisem_v10i4s_487 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2468-4376&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2468-4376&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2468-4376&client=summon |