Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering

The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and in...

Full description

Saved in:
Bibliographic Details
Published inJournal of information systems engineering & management Vol. 10; no. 4s; pp. 134 - 146
Main Author T. Elavarasi
Format Journal Article
LanguageEnglish
Published 17.01.2025
Online AccessGet full text
ISSN2468-4376
2468-4376
DOI10.52783/jisem.v10i4s.487

Cover

Abstract The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research paper plans to perform the document clustering using the nature inspired optimization technique. Initially, the dataset is manually gathered from different sources. Next, the data preparation has been done for extracting the text content from the published documents. These prepared data undergo pre-processing for removing the punctuations, stop words, and lowercase conversion. The features are extracted from these pre-processed data utilizing the Term Frequency-Inverse Document Frequency (TF-IDF) approach for extracting the keywords. The extracted features undergo the final clustering phase employing the spectral clustering algorithm, in which the parameter tuning has been done by the nature inspired optimization algorithm referred as Particle Swarm Optimization (PSO) with the consideration of silhouette score maximization as the objective function. This proposed spectral clustering-PSO clusters the final output into six classes such as data mining, deep learning, image, machine learning, network, and sports respectively. The proposed document clustering model describes its betterment over the remaining techniques with respect to distinct measures. The proposed spectral clustering-PSO in terms of silhouette score is 51.92%, 70.81%, 45.93%, and 20.89% better than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. Similarly, the proposed spectral clustering-PSO in terms of davies bouldin score is 89.69%, 58.48%, 32.67%, and 13.99% advanced than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. 
AbstractList The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research paper plans to perform the document clustering using the nature inspired optimization technique. Initially, the dataset is manually gathered from different sources. Next, the data preparation has been done for extracting the text content from the published documents. These prepared data undergo pre-processing for removing the punctuations, stop words, and lowercase conversion. The features are extracted from these pre-processed data utilizing the Term Frequency-Inverse Document Frequency (TF-IDF) approach for extracting the keywords. The extracted features undergo the final clustering phase employing the spectral clustering algorithm, in which the parameter tuning has been done by the nature inspired optimization algorithm referred as Particle Swarm Optimization (PSO) with the consideration of silhouette score maximization as the objective function. This proposed spectral clustering-PSO clusters the final output into six classes such as data mining, deep learning, image, machine learning, network, and sports respectively. The proposed document clustering model describes its betterment over the remaining techniques with respect to distinct measures. The proposed spectral clustering-PSO in terms of silhouette score is 51.92%, 70.81%, 45.93%, and 20.89% better than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. Similarly, the proposed spectral clustering-PSO in terms of davies bouldin score is 89.69%, 58.48%, 32.67%, and 13.99% advanced than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. 
Author T. Elavarasi
Author_xml – sequence: 1
  surname: T. Elavarasi
  fullname: T. Elavarasi
BookMark eNqNkMtOAjEYRhuDiYg8gLu-wGDbaWlniXhNSDBB43JSOn-xpHNJ25Hg00vABUtX37c5Z3Gu0aBpG0DolpKJYFLld1sXoZ58U-J4nHAlL9CQ8anKeC6ng7N_hcYxbgkhjHIiOBuiz1UHJgXt8dz3MUFwzSa71xEq_KZDcsYDXu10qPGyS652Pzq5tsEzv2mDS181tm3AD63pa2jSmeMGXVrtI4z_doQ-nh7f5y_ZYvn8Op8tMkOFkFlOjWRGw9pYkisNGgRlhJqCgy6EkjmRUBhuFKytpJUqrK2k1BW1BeVUVPkIsZO3bzq932nvyy64Wod9SUl5rFMe65SnOuWhzgGiJ8iENsYA9h_ML_tCbzE
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.52783/jisem.v10i4s.487
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2468-4376
EndPage 146
ExternalDocumentID 10.52783/jisem.v10i4s.487
10_52783_jisem_v10i4s_487
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
OK1
ADTOC
IPNFZ
RIG
UNPAY
ID FETCH-LOGICAL-c1557-31c72caebcf038aeae51201c94ea9587307e9c4c8ebf71d89ffd77ad1f91415d3
IEDL.DBID UNPAY
ISSN 2468-4376
IngestDate Sun Sep 07 11:16:13 EDT 2025
Tue Jul 01 03:41:14 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 4s
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1557-31c72caebcf038aeae51201c94ea9587307e9c4c8ebf71d89ffd77ad1f91415d3
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.52783/jisem.v10i4s.487
PageCount 13
ParticipantIDs unpaywall_primary_10_52783_jisem_v10i4s_487
crossref_primary_10_52783_jisem_v10i4s_487
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-01-17
PublicationDateYYYYMMDD 2025-01-17
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-17
  day: 17
PublicationDecade 2020
PublicationTitle Journal of information systems engineering & management
PublicationYear 2025
SSID ssj0002140542
Score 2.28268
Snippet The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining...
SourceID unpaywall
crossref
SourceType Open Access Repository
Index Database
StartPage 134
Title Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering
URI https://doi.org/10.52783/jisem.v10i4s.487
UnpaywallVersion publishedVersion
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2468-4376
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002140542
  issn: 2468-4376
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFJ4IHNSDuxEXMgdPmmKnTGnniARCTEQSJeKpmc6iaFkCVKIHf7tv2mrQmKjXZrrk9U2_r2_5HkLHSjLXUaRq-QAPFtWSW8yXxPJs4buOG1ZkUkRz2a62uvSi5_YysWjTC7OQv3fNEIizx_5UDcrPxO7TaRnYdQ4Vqi7Q7jwqdNud2p0ZHme6hyjslDRr-fN5X3BnOR6O-cucR9ECmDTX0zKsaaJBaGpInsrxLCyL128KjX96zg20llFKXEt9YBMtqeEWWl0QGtxGt2bMvIlp4HoUG2kEOGqdA4BJ3Ml8B1_P-WSAr-ATMsh6M3Etuh9N-rOHAQZqiwGPYhNLXLjGDuo2Gzf1lpWNVLAEEAcTkRSeI7gKhbYrPldcAeDbRDCqOHN92O6eYoIKX4XaI9JnWkvP45JoRgDqZWUX5YejodpDWNiSVjWh0gjkiwplnstsLR1tEp9wmyI6-TB4ME6VMwL440isFSTWClJrBWCtIjr9fCW_r97_1-oDtOKYob02sYh3iPKzSayOgEnMwhLKXb41SpknvQODcMp4
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFJ4oHNSDuxG3zMGTptgpU9o5IpEQE5FEiXhqprMoSguBVqK_3je0mmpM1GszXfL6pt_Xt3wPoWMlmesoUrd8gAeLaskt5ktiebbwXccNa3JeRHPVqbd79LLv9nOxaNMLU8jfu2YIxNnTYKqi6guxB3RaBXa9iMp1F2h3CZV7nW7j3gyPM91DFHZKlrX8-bwvuLOUxmP-OuPDYQFMWmtZGdZ0rkFoakieq2kSVsXbN4XGPz3nOlrNKSVuZD6wgRZUvIlWCkKDW-jOjJk3MQ3cHKZGGgGOWucAYBJ3c9_BNzM-ifA1fEKivDcTN4YPo8kgeYwwUFsMeJSaWGLhGtuo17q4bbatfKSCJYA4mIik8BzBVSi0XfO54goA3yaCUcWZ68N29xQTVPgq1B6RPtNaeh6XRDMCUC9rO6gUj2K1i7CwJa1rQqURyBc1yjyX2Vo62iQ-4TYVdPJh8GCcKWcE8Mcxt1Ywt1aQWSsAa1XQ6ecr-X313r9W76NlxwzttYlFvANUSiapOgQmkYRHuQ-9A8kryUc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spectral+Clustering-Based+Particle+Swarm+Optimization+Algorithm+for+Document+Clustering&rft.jtitle=Journal+of+information+systems+engineering+%26+management&rft.au=T.+Elavarasi&rft.date=2025-01-17&rft.issn=2468-4376&rft.eissn=2468-4376&rft.volume=10&rft.issue=4s&rft.spage=134&rft.epage=146&rft_id=info:doi/10.52783%2Fjisem.v10i4s.487&rft.externalDBID=n%2Fa&rft.externalDocID=10_52783_jisem_v10i4s_487
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2468-4376&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2468-4376&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2468-4376&client=summon