Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion

The clustering of web search results – or web document clustering – has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of docu...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 281; pp. 248 - 264
Main Authors Cobos, Carlos, Muñoz-Collazos, Henry, Urbano-Muñoz, Richar, Mendoza, Martha, León, Elizabeth, Herrera-Viedma, Enrique
Format Journal Article
LanguageEnglish
Published Elsevier Inc 10.10.2014
Subjects
Online AccessGet full text
ISSN0020-0255
1872-6291
DOI10.1016/j.ins.2014.05.047

Cover

Abstract The clustering of web search results – or web document clustering – has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering web results already exist, but results show room for more to be done. This paper introduces a new description-centric algorithm for the clustering of web results, called WDC-CSK, which is based on the cuckoo search meta-heuristic algorithm, k-means algorithm, Balanced Bayesian Information Criterion, split and merge methods on clusters, and frequent phrases approach for cluster labeling. The cuckoo search meta-heuristic provides a combined global and local search strategy in the solution space. Split and merge methods replace the original Lévy flights operation and try to improve existing solutions (nests), so they can be considered as local search methods. WDC-CSK includes an abandon operation that provides diversity and prevents the population nests from converging too quickly. Balanced Bayesian Information Criterion is used as a fitness function and allows defining the number of clusters automatically. WDC-CSK was tested with four data sets (DMOZ-50, AMBIENT, MORESQUE and ODP-239) over 447 queries. The algorithm was also compared against other established web document clustering algorithms, including Suffix Tree Clustering (STC), Lingo, and Bisecting k-means. The results show a considerable improvement upon the other algorithms as measured by recall, F-measure, fall-out, accuracy and SSLk.
AbstractList The clustering of web search results – or web document clustering – has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering web results already exist, but results show room for more to be done. This paper introduces a new description-centric algorithm for the clustering of web results, called WDC-CSK, which is based on the cuckoo search meta-heuristic algorithm, k-means algorithm, Balanced Bayesian Information Criterion, split and merge methods on clusters, and frequent phrases approach for cluster labeling. The cuckoo search meta-heuristic provides a combined global and local search strategy in the solution space. Split and merge methods replace the original Lévy flights operation and try to improve existing solutions (nests), so they can be considered as local search methods. WDC-CSK includes an abandon operation that provides diversity and prevents the population nests from converging too quickly. Balanced Bayesian Information Criterion is used as a fitness function and allows defining the number of clusters automatically. WDC-CSK was tested with four data sets (DMOZ-50, AMBIENT, MORESQUE and ODP-239) over 447 queries. The algorithm was also compared against other established web document clustering algorithms, including Suffix Tree Clustering (STC), Lingo, and Bisecting k-means. The results show a considerable improvement upon the other algorithms as measured by recall, F-measure, fall-out, accuracy and SSLk.
The clustering of web search results - or web document clustering - has become a very interesting research area among academic and scientific communities involved in information retrieval. Web search result clustering systems, also called Web Clustering Engines, seek to increase the coverage of documents presented for the user to review, while reducing the time spent reviewing them. Several algorithms for clustering web results already exist, but results show room for more to be done. This paper introduces a new description-centric algorithm for the clustering of web results, called WDC-CSK, which is based on the cuckoo search meta-heuristic algorithm, k-means algorithm, Balanced Bayesian Information Criterion, split and merge methods on clusters, and frequent phrases approach for cluster labeling. The cuckoo search meta-heuristic provides a combined global and local search strategy in the solution space. Split and merge methods replace the original Levy flights operation and try to improve existing solutions (nests), so they can be considered as local search methods. WDC-CSK includes an abandon operation that provides diversity and prevents the population nests from converging too quickly. Balanced Bayesian Information Criterion is used as a fitness function and allows defining the number of clusters automatically. WDC-CSK was tested with four data sets (DMOZ-50, AMBIENT, MORESQUE and ODP-239) over 447 queries. The algorithm was also compared against other established web document clustering algorithms, including Suffix Tree Clustering (STC), Lingo, and Bisecting k-means. The results show a considerable improvement upon the other algorithms as measured by recall, F-measure, fall-out, accuracy and SSL k .
Author León, Elizabeth
Herrera-Viedma, Enrique
Urbano-Muñoz, Richar
Cobos, Carlos
Mendoza, Martha
Muñoz-Collazos, Henry
Author_xml – sequence: 1
  givenname: Carlos
  orcidid: 0000-0002-6263-1911
  surname: Cobos
  fullname: Cobos, Carlos
  email: ccobos@unicauca.edu.co
  organization: Information Technology Research Group (GTI) Members, Universidad del Cauca, Sector Tulcán Office 422 FIET, Popayán, Colombia
– sequence: 2
  givenname: Henry
  surname: Muñoz-Collazos
  fullname: Muñoz-Collazos, Henry
  organization: Information Technology Research Group (GTI) Members, Universidad del Cauca, Sector Tulcán Office 422 FIET, Popayán, Colombia
– sequence: 3
  givenname: Richar
  surname: Urbano-Muñoz
  fullname: Urbano-Muñoz, Richar
  organization: Information Technology Research Group (GTI) Members, Universidad del Cauca, Sector Tulcán Office 422 FIET, Popayán, Colombia
– sequence: 4
  givenname: Martha
  surname: Mendoza
  fullname: Mendoza, Martha
  organization: Information Technology Research Group (GTI) Members, Universidad del Cauca, Sector Tulcán Office 422 FIET, Popayán, Colombia
– sequence: 5
  givenname: Elizabeth
  surname: León
  fullname: León, Elizabeth
  organization: Systems and Industrial Engineering Department, Engineering Faculty, Universidad Nacional de Colombia, Colombia
– sequence: 6
  givenname: Enrique
  surname: Herrera-Viedma
  fullname: Herrera-Viedma, Enrique
  organization: Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
BookMark eNp9kD1PwzAQhi0EEi3wA9g8siScncRJxAQVXxISC8yW61yoS2q3tgPi3-NSWBg6nYf38b33TMmhdRYJOWeQM2DicpkbG3IOrMyhyqGsD8iENTXPBG_ZIZkAcMiAV9UxmYawBEgRISZkMxvGENEb-0ZdTz9xTgMqrxfUYxiHGOhcBeyoszQukOpRvzv3F1HDm_MmLlZU2Y7eqEFZjdvHFwajLH20vfMrFU2iZymY1jh7So56NQQ8-50n5PXu9mX2kD093z_Orp8yXbQiZhy7RggQnVZtaloUPUBfl71qeFvUWvRN3bWVLoHN24LPtYKu5CiaqmwapTgUJ-Ri9-_au82IIcqVCRqHVBLdGCQTJeclT05StN5FtXcheOylNvGndvTKDJKB3EqWS5kky61kCZVMBhPJ_pFrb1bKf-1lrnYMpus_DHoZtMGtOeNRR9k5s4f-Brh7l_4
CitedBy_id crossref_primary_10_1007_s10044_024_01228_5
crossref_primary_10_1002_prp2_687
crossref_primary_10_1016_j_cie_2019_06_008
crossref_primary_10_1080_14680629_2020_1753100
crossref_primary_10_1016_j_dss_2016_11_003
crossref_primary_10_1007_s11042_023_14716_3
crossref_primary_10_1016_j_eswa_2023_120705
crossref_primary_10_1016_j_eswa_2023_122128
crossref_primary_10_1002_dac_4133
crossref_primary_10_1134_S1054661819030052
crossref_primary_10_1007_s10489_018_1190_6
crossref_primary_10_1142_S0219649218500338
crossref_primary_10_1177_0165551515591724
crossref_primary_10_1007_s00024_021_02665_7
crossref_primary_10_1016_j_ins_2022_05_020
crossref_primary_10_7232_JKIIE_2021_47_6_529
crossref_primary_10_1109_TETCI_2017_2739124
crossref_primary_10_1007_s00500_015_1993_x
crossref_primary_10_1007_s13278_024_01246_5
crossref_primary_10_1166_jctn_2020_9460
crossref_primary_10_1007_s10115_021_01650_9
crossref_primary_10_1007_s11042_022_14229_5
crossref_primary_10_1016_j_ins_2014_11_042
crossref_primary_10_1371_journal_pone_0217686
crossref_primary_10_1016_j_cherd_2016_09_025
crossref_primary_10_1016_j_procs_2020_03_380
crossref_primary_10_7472_jksii_2015_16_6_79
crossref_primary_10_1016_j_sigpro_2024_109735
crossref_primary_10_1016_j_cmpb_2022_106752
crossref_primary_10_1016_j_aci_2017_05_003
crossref_primary_10_1007_s00521_016_2464_8
crossref_primary_10_1109_TNNLS_2018_2844242
crossref_primary_10_1002_qj_3818
crossref_primary_10_1016_j_apm_2015_10_052
crossref_primary_10_3390_su10030682
crossref_primary_10_1016_j_aap_2024_107844
crossref_primary_10_1080_21642583_2018_1496042
crossref_primary_10_1109_ACCESS_2020_3012606
crossref_primary_10_1016_j_asoc_2016_09_048
crossref_primary_10_3389_fphys_2023_1177351
crossref_primary_10_1007_s13278_020_0633_3
crossref_primary_10_1016_j_ins_2014_10_037
crossref_primary_10_1007_s13369_016_2270_8
crossref_primary_10_1016_j_asoc_2017_06_059
crossref_primary_10_1007_s10796_020_10021_8
crossref_primary_10_32604_cmc_2022_029400
crossref_primary_10_1016_j_isatra_2017_10_001
crossref_primary_10_1007_s10044_022_01065_4
crossref_primary_10_1016_j_eswa_2017_11_044
crossref_primary_10_1108_GM_12_2019_0240
crossref_primary_10_1007_s12652_021_03603_0
crossref_primary_10_1016_j_asoc_2017_02_034
crossref_primary_10_1016_j_swevo_2020_100751
crossref_primary_10_2139_ssrn_4075461
crossref_primary_10_1587_transinf_2019EDP7013
crossref_primary_10_1007_s10791_015_9271_1
crossref_primary_10_1109_ACCESS_2020_2990972
crossref_primary_10_1016_j_eswa_2015_03_013
Cites_doi 10.1145/331499.331504
10.1109/ICADIWT.2009.5273918
10.1145/2124295.2124324
10.1109/MIS.2005.38
10.1016/j.ins.2011.01.012
10.1145/860435.860485
10.1109/ISECS.2009.16
10.1109/CEC.2009.4982974
10.1162/08997660152469387
10.1109/WICT.2011.6141370
10.1016/j.amc.2007.12.058
10.1109/CEC.2011.5949773
10.1016/j.datak.2007.08.001
10.1007/s10115-007-0114-2
10.1016/j.dss.2009.04.002
10.1016/j.ins.2012.02.067
10.1109/ICMLC.2005.1527337
10.1007/11495772_68
10.1007/978-3-540-24655-8_8
10.1016/j.ipm.2009.08.003
10.1109/ICMLC.2011.6017004
10.1145/1835449.1835480
10.1016/j.neucom.2013.05.046
10.1109/CEC.2010.5586016
10.1145/1541880.1541884
10.1016/j.datak.2006.10.006
10.1016/j.ipm.2011.08.004
10.1016/j.patrec.2007.01.001
10.1007/s10462-011-9203-4
10.1093/comjnl/41.8.578
10.1198/016214501753168398
10.1016/j.ins.2011.08.026
10.1109/4235.585893
10.1177/003754970107600201
10.1109/FUZZY.2010.5584771
10.1016/j.ins.2011.08.022
10.1016/j.patrec.2009.04.001
10.1137/1.9781611972733.6
10.1016/j.protcy.2012.05.061
10.1109/TSMCB.2012.2188509
10.1109/WI-IAT.2009.37
10.4324/9780203468029_chapter_8
10.1109/TFUZZ.2006.889970
10.1007/s10618-008-0123-0
10.1016/j.ipm.2012.12.002
10.1109/TEVC.2002.800880
10.1109/CEC.2010.5586109
10.1145/775047.775110
10.1109/DEXA.2009.39
10.1016/j.eswa.2008.12.046
10.3233/IDA-2007-11602
10.1109/CEC.2007.4424770
10.1109/JSTARS.2012.2217941
10.1109/JSTARS.2012.2187432
10.1145/1008992.1009030
10.1016/j.amc.2011.07.073
10.1007/11735106_16
ContentType Journal Article
Copyright 2014 Elsevier Inc.
Copyright_xml – notice: 2014 Elsevier Inc.
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.ins.2014.05.047
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
EISSN 1872-6291
EndPage 264
ExternalDocumentID 10_1016_j_ins_2014_05_047
S0020025514006100
GroupedDBID --K
--M
--Z
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABAOU
ABBOA
ABFNM
ABJNI
ABMAC
ABUCO
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFS
ACRLP
ACZNC
ADBBV
ADEZE
ADGUI
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGVJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
ARUGR
AXJTR
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SPC
SPCBC
SSB
SSD
SST
SSV
SSW
SSZ
T5K
TN5
TWZ
WH7
XPP
ZMT
~02
~G-
1OL
29I
77I
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABEFU
ABWVN
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
ADVLN
AEIPS
AEUPX
AFFNX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
FEDTE
FGOYB
HLZ
HVGLF
HZ~
H~9
R2-
SBC
SDS
SEW
UHS
WUQ
YYP
ZY4
~HD
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c396t-2ed86606dca976633f00f74fa82937c6f87d95c401b932bca0d42e685488aa203
IEDL.DBID .~1
ISSN 0020-0255
IngestDate Sat Sep 27 17:17:59 EDT 2025
Wed Oct 01 08:29:34 EDT 2025
Thu Apr 24 23:11:58 EDT 2025
Fri Feb 23 02:23:17 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Balanced Bayesian Information Criterion
Clustering of web result
k-Mean
Web document clustering
Cuckoo search algorithm
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c396t-2ed86606dca976633f00f74fa82937c6f87d95c401b932bca0d42e685488aa203
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-6263-1911
PQID 1642242025
PQPubID 23500
PageCount 17
ParticipantIDs proquest_miscellaneous_1642242025
crossref_citationtrail_10_1016_j_ins_2014_05_047
crossref_primary_10_1016_j_ins_2014_05_047
elsevier_sciencedirect_doi_10_1016_j_ins_2014_05_047
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2014-10-10
PublicationDateYYYYMMDD 2014-10-10
PublicationDate_xml – month: 10
  year: 2014
  text: 2014-10-10
  day: 10
PublicationDecade 2010
PublicationTitle Information sciences
PublicationYear 2014
Publisher Elsevier Inc
Publisher_xml – name: Elsevier Inc
References Wu, Kumar, Ross Quinlan, Ghosh, Yang, Motoda, McLachlan, Ng, Liu, Yu, Zhou, Steinbach, Hand, Steinberg (b0370) 2008; 14
L. Jing, Survey of Text Clustering, 2008.
Webb (b0440) 2002
Li, Chung, Holt (b0205) 2008; 64
Q.H. Nguyen, Y.S. Ong, N. Krasnogor, A study on the design issues of Memetic Algorithm, in: IEEE Congress on Evolutionary Computation, 2007. CEC 2007, 2007, pp. 2390–2397.
Hansen, Yu (b0165) 2001; 96
Cantu-Paz (b0055) 2000
C. Carpineto, G. Romano, Optimal meta search results clustering, in: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Geneva, Switzerland, 2010, pp. 170–177.
Fraley, Raftery (b0135) 1998; 41
J. Han, M. Kamber, A.K.H. Tung, Spatial clustering methods in data mining: a survey, in: Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001, pp. 1–29.
Y. Xin-She, S. Deb, Cuckoo search via lévy flights, in: World Congress on Nature & Biologically Inspired Computing, 2009. NaBIC 2009, 2009, pp. 210–214.
Z. Zhong-Yuan, J. Zhang, Survey on the variations and applications of nonnegative matrix factorization, in: ISORA’10: The Ninth International Symposium on Operations Research and Its Applications, ORSC & APORC, Chengdu-Jiuzhaigou, China, 2010, pp. 317–323.
Cobos, Rodriguez, Rivera, Betancourt, Mendoza, León, Herrera-Viedma (b0105) 2013; 49
H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, J. Ma, Learning to cluster web search results, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Sheffield, United Kingdom, 2004, pp. 210–217.
Data Clustering: Algorithms and Applications. CRC Press; 2014.
Cobos, Estupiñán, Pérez (b0085) 2011; 218
Lee, Lee (b0200) 2012; 6
S. Goel, A. Sharma, P. Bedi, Cuckoo search clustering algorithm: a novel strategy of biomimicry, in: 2011 World Congress on Information and Communication Technologies (WICT), 2011, pp. 916–921.
Fersini, Messina, Archetti (b0125) 2010; 46
S. Zheng, X. Zhao, B. Zhang, H. Bu, Web document clustering research based on granular computing, in: Second International Symposium on Electronic Commerce and Security, 2009. ISECS ‘09, 2009, pp. 446–450.
M. Hemalatha, D. Sathyasrinivas, Hybrid neural network model for web document clustering, in: Second International Conference on the Applications of Digital Information and Web Technologies, 2009. ICADIWT ‘09, 2009, pp. 531–538.
Yang (b0395) 2008
Mecca, Raunich, Pappalardo (b0245) 2007; 62
X. He, J.-B. Wang, Z.-X. Zhang, Y.-R. Cai, Clustering web documents based on Multiclass spectral clustering, in: 2011 International conference on machine learning and cybernetics (ICMLC), 2011, pp. 1466–1471.
Senthilnath, Omkar, Mani, Karnwal, Shreyas (b0315) 2013; 6
A. Bernardini, C. Carpineto, M. D’Amico, Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering, in: WI-IAT ‘09: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, 2009, pp. 206–213.
Carpineto, Osiński, Romano, Weiss (b0065) 2009; 41
Xu, Xu, Wunsch (b0385) 2012; 42
Valian, Mohanna, Tavakoli (b0350) 2011; 2
Naldi, Campello (b0255) 2014; 127
Osiński, Weiss (b0285) 2005; 20
Z. Oren, E. Oren, Web document clustering: a feasibility demonstration, in: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Melbourne, Australia, 1998, pp. 46–54.
Mahdavi, Chehreghani, Abolhassani, Forsati (b0230) 2008; 201
Mahamed, Andries, Ayed (b0220) 2007; 11
S. Osiński, Improving quality of search results clustering with approximate matrix factorizations, in: 28th European Conference on IR Research (ECIR 2006), London, UK, 2006, pp. 167–178.
Senthilnath, Omkar, Mani, Tejovanth, Diwakar, Shenoy (b0435) 2012; 5
N. Bacanin, An object-oriented software implementation of a novel cuckoo search algorithm, in: Proceedings of the 5th European Conference on European Computing Conference, World Scientific and Engineering Academy and Society (WSEAS), Paris, France, 2011, pp. 245–250.
Mahdavi, Abolhassani (b0225) 2009; 18
Porcel, Tejeda-Lorente, Martínez, Herrera-Viedma (b0290) 2012; 184
U. Scaiella, P. Ferragina, A. Marino, M. Ciaramita, Topical clustering of search results, in: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, Seattle, Washington, USA, 2012, pp. 223–232.
Ahmadi-Abkenari, Selamat (b0010) 2012; 184
Loia, Pedrycz, Senatore (b0210) 2007; 15
D. Zhang, Y. Dong, Semantic, hierarchical, online clustering of web search results, in: Advanced Web Technologies and Applications, 2004, pp. 69–78.
B. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent itemsets, in: Proceedings of the SIAM International Conference on Data Mining, 2003, pp. 59–70.
M. Muhr, M. Granitzer, Automatic cluster number selection using a split and merge K-means approach, in: 20th International Workshop on Database and Expert Systems Application, 2009. DEXA ‘09, 2009, pp. 363–367.
Cobos, Muñoz, Mendoza, León, Herrera-Viedma (b0100) 2012
Senthilnath, Das, Omkar, Mani (b0310) 2013
C. Cobos, J. Andrade, W. Constain, M. Mendoza, E. León, Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion, in: 2010 IEEE Congress on Evolutionary Computation (CEC), IEEE, Barcelona, Spain, 2010, pp. 4637–4644.
Jain, Dubes (b0180) 1988
Berkhin, Kogan, Nicholas, Teboulle (b0430) 2006
Manning, Raghavan, Schütze (b0235) 2008
M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques, in: KDD workshop on text mining, ACM Boston, MA, USA., 2000, pp. 1–20.
Baeza-Yates, Ribeiro-Neto (b0030) 1999
Reddy, Jana (b0295) 2012; 4
C. Cobos, M. Mendoza, E. Leon, A hyper-heuristic approach to design and tuning heuristic methods for web document clustering, in: 2011 IEEE Congress on Evolutionary Computation (CEC), IEEE, New Orleans, USA, 2011, pp. 1350–1358.
Geem, Kim, Loganathan (b0145) 2001; 76
Jain, Murty, Flynn (b0185) 1999; 31
Wolpert, Macready (b0365) 1997; 1
Carullo, Binaghi, Gallo (b0075) 2009; 30
Yang (b0390) 2008
K. Hammouda, Web Mining: Clustering Web Documents A Preliminary Review, 2001, pp. 1–13.
Song, Li, Park (b0335) 2009; 36
Alba (b0015) 2005
Lee, On (b0195) 2011; 36
Berkhin (b0425) 2002
Carpineto, D’Amico, Romano (b0060) 2012; 48
Eiben, Smit (b0120) 2012
T. Matsumoto, E. Hung, Fuzzy clustering and relevance ranking of web search results with differentiating cluster label generation, in: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), 2010, pp. 1–8.
S. Osiński, D. Weiss, Carrot 2: Design of a Flexible and Efficient Web Information Retrieval Framework, in: Advances in Web Intelligence, 2005, pp. 439–444.
L. Xiang-Wei, H. Pi-Lian, W. Hui-Ying, The research of text clustering algorithms based on frequent term sets, in: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005, vol. 2354, 2005, pp. 2352–2356.
Serrano-Guerrero, Herrera-Viedma, Olivas, Cerezo, Romero (b0325) 2011; 181
S.K. Smit, A.E. Eiben, Comparing parameter tuning methods for evolutionary algorithms, in: IEEE Congress on Evolutionary Computation, 2009. CEC ‘09, 2009, pp. 399–406.
Chehreghani, Abolhassani, Chehreghani (b0110) 2009; 47
Redmond, Heneghan (b0300) 2007; 28
Domingo-Ferrer, González-Nicolás (b0115) 2012; 200
X. Wei, L. Xin, G. Yihong, Document clustering based on non-negative matrix factorization, in: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Toronto, Canada, 2003, pp. 267–273.
Luque, Alba (b0215) 2011
R. Navigli, G. Crisafulli, Inducing word senses to improve web search result clustering, in: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Cambridge, Massachusetts, 2010, pp. 116–126.
Alba, Tomassini (b0020) 2002; 6
C. Cobos, C. Montealegre, M. Mejía, M. Mendoza, E. León, Web document clustering based on a new niching memetic algorithm, term-document matrix and Bayesian information criterion, in: 2010 IEEE Congress on Evolutionary Computation (CEC), IEEE, Barcelona, Spain, 2010, pp. 4629–4636.
Forgy (b0130) 1965; 21
F. Beil, M. Ester, X. Xu, Frequent term-based text clustering, in: KDD ‘02: International conference on Knowledge discovery and data mining (ACM SIGKDD), ACM, Edmonton, Alberta, Canada, 2002, pp. 436–442.
Sugiyama, Ogawa (b0345) 2001; 13
Yang, Deb (b0400) 2010; 1
10.1016/j.ins.2014.05.047_b0305
Domingo-Ferrer (10.1016/j.ins.2014.05.047_b0115) 2012; 200
Valian (10.1016/j.ins.2014.05.047_b0350) 2011; 2
Wolpert (10.1016/j.ins.2014.05.047_b0365) 1997; 1
Senthilnath (10.1016/j.ins.2014.05.047_b0315) 2013; 6
Eiben (10.1016/j.ins.2014.05.047_b0120) 2012
Lee (10.1016/j.ins.2014.05.047_b0200) 2012; 6
Mahdavi (10.1016/j.ins.2014.05.047_b0230) 2008; 201
Chehreghani (10.1016/j.ins.2014.05.047_b0110) 2009; 47
Senthilnath (10.1016/j.ins.2014.05.047_b0435) 2012; 5
10.1016/j.ins.2014.05.047_b0070
Mahamed (10.1016/j.ins.2014.05.047_b0220) 2007; 11
10.1016/j.ins.2014.05.047_b0190
Fraley (10.1016/j.ins.2014.05.047_b0135) 1998; 41
10.1016/j.ins.2014.05.047_b0150
10.1016/j.ins.2014.05.047_b0270
Alba (10.1016/j.ins.2014.05.047_b0015) 2005
10.1016/j.ins.2014.05.047_b0155
10.1016/j.ins.2014.05.047_b0275
Fersini (10.1016/j.ins.2014.05.047_b0125) 2010; 46
10.1016/j.ins.2014.05.047_b0035
Cobos (10.1016/j.ins.2014.05.047_b0100) 2012
Hansen (10.1016/j.ins.2014.05.047_b0165) 2001; 96
Mecca (10.1016/j.ins.2014.05.047_b0245) 2007; 62
Baeza-Yates (10.1016/j.ins.2014.05.047_b0030) 1999
Jain (10.1016/j.ins.2014.05.047_b0180) 1988
Mahdavi (10.1016/j.ins.2014.05.047_b0225) 2009; 18
Senthilnath (10.1016/j.ins.2014.05.047_b0310) 2013
Alba (10.1016/j.ins.2014.05.047_b0020) 2002; 6
10.1016/j.ins.2014.05.047_b0080
Cobos (10.1016/j.ins.2014.05.047_b0085) 2011; 218
Serrano-Guerrero (10.1016/j.ins.2014.05.047_b0325) 2011; 181
Lee (10.1016/j.ins.2014.05.047_b0195) 2011; 36
Redmond (10.1016/j.ins.2014.05.047_b0300) 2007; 28
Cobos (10.1016/j.ins.2014.05.047_b0105) 2013; 49
Porcel (10.1016/j.ins.2014.05.047_b0290) 2012; 184
10.1016/j.ins.2014.05.047_b0360
10.1016/j.ins.2014.05.047_b0160
10.1016/j.ins.2014.05.047_b0280
Carpineto (10.1016/j.ins.2014.05.047_b0065) 2009; 41
Yang (10.1016/j.ins.2014.05.047_b0395) 2008
Li (10.1016/j.ins.2014.05.047_b0205) 2008; 64
10.1016/j.ins.2014.05.047_b0240
Sugiyama (10.1016/j.ins.2014.05.047_b0345) 2001; 13
10.1016/j.ins.2014.05.047_b0005
Ahmadi-Abkenari (10.1016/j.ins.2014.05.047_b0010) 2012; 184
Naldi (10.1016/j.ins.2014.05.047_b0255) 2014; 127
Manning (10.1016/j.ins.2014.05.047_b0235) 2008
10.1016/j.ins.2014.05.047_b0405
Loia (10.1016/j.ins.2014.05.047_b0210) 2007; 15
Luque (10.1016/j.ins.2014.05.047_b0215) 2011
Osiński (10.1016/j.ins.2014.05.047_b0285) 2005; 20
10.1016/j.ins.2014.05.047_b0090
Yang (10.1016/j.ins.2014.05.047_b0400) 2010; 1
10.1016/j.ins.2014.05.047_b0250
10.1016/j.ins.2014.05.047_b0095
10.1016/j.ins.2014.05.047_b0050
10.1016/j.ins.2014.05.047_b0170
10.1016/j.ins.2014.05.047_b0375
10.1016/j.ins.2014.05.047_b0330
10.1016/j.ins.2014.05.047_b0175
Xu (10.1016/j.ins.2014.05.047_b0385) 2012; 42
10.1016/j.ins.2014.05.047_b0410
Wu (10.1016/j.ins.2014.05.047_b0370) 2008; 14
10.1016/j.ins.2014.05.047_b0415
Carpineto (10.1016/j.ins.2014.05.047_b0060) 2012; 48
Reddy (10.1016/j.ins.2014.05.047_b0295) 2012; 4
Cantu-Paz (10.1016/j.ins.2014.05.047_b0055) 2000
10.1016/j.ins.2014.05.047_b0140
10.1016/j.ins.2014.05.047_b0260
Webb (10.1016/j.ins.2014.05.047_b0440) 2002
Forgy (10.1016/j.ins.2014.05.047_b0130) 1965; 21
Song (10.1016/j.ins.2014.05.047_b0335) 2009; 36
10.1016/j.ins.2014.05.047_b0380
Carullo (10.1016/j.ins.2014.05.047_b0075) 2009; 30
Jain (10.1016/j.ins.2014.05.047_b0185) 1999; 31
10.1016/j.ins.2014.05.047_b0265
Berkhin (10.1016/j.ins.2014.05.047_b0425) 2002
Berkhin (10.1016/j.ins.2014.05.047_b0430) 2006
10.1016/j.ins.2014.05.047_b0340
Yang (10.1016/j.ins.2014.05.047_b0390) 2008
10.1016/j.ins.2014.05.047_b0025
Geem (10.1016/j.ins.2014.05.047_b0145) 2001; 76
10.1016/j.ins.2014.05.047_b0420
References_xml – volume: 127
  start-page: 30
  year: 2014
  end-page: 42
  ident: b0255
  article-title: Evolutionary k-means for distributed data sets
  publication-title: Neurocomputing
– volume: 62
  start-page: 504
  year: 2007
  end-page: 522
  ident: b0245
  article-title: A new algorithm for clustering search results
  publication-title: Data Knowl. Eng.
– reference: N. Bacanin, An object-oriented software implementation of a novel cuckoo search algorithm, in: Proceedings of the 5th European Conference on European Computing Conference, World Scientific and Engineering Academy and Society (WSEAS), Paris, France, 2011, pp. 245–250.
– volume: 47
  start-page: 374
  year: 2009
  end-page: 382
  ident: b0110
  article-title: Density link-based methods for clustering web pages
  publication-title: Decis. Support Syst.
– volume: 36
  start-page: 69
  year: 2011
  end-page: 85
  ident: b0195
  article-title: An effective web document clustering algorithm based on bisection and merge
  publication-title: Artif. Intell. Rev.
– year: 1999
  ident: b0030
  article-title: Modern Information Retrieval
– reference: C. Cobos, C. Montealegre, M. Mejía, M. Mendoza, E. León, Web document clustering based on a new niching memetic algorithm, term-document matrix and Bayesian information criterion, in: 2010 IEEE Congress on Evolutionary Computation (CEC), IEEE, Barcelona, Spain, 2010, pp. 4629–4636.
– reference: L. Xiang-Wei, H. Pi-Lian, W. Hui-Ying, The research of text clustering algorithms based on frequent term sets, in: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, 2005, vol. 2354, 2005, pp. 2352–2356.
– year: 2002
  ident: b0425
  article-title: Survey of Clustering Data Mining Techniques
– reference: Z. Oren, E. Oren, Web document clustering: a feasibility demonstration, in: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Melbourne, Australia, 1998, pp. 46–54.
– reference: S. Osiński, Improving quality of search results clustering with approximate matrix factorizations, in: 28th European Conference on IR Research (ECIR 2006), London, UK, 2006, pp. 167–178.
– volume: 6
  start-page: 861
  year: 2013
  end-page: 866
  ident: b0315
  article-title: Crop stage classification of hyperspectral data using unsupervised techniques
  publication-title: IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing
– volume: 11
  start-page: 583
  year: 2007
  end-page: 605
  ident: b0220
  article-title: An overview of clustering methods
  publication-title: Intell. Data Anal.
– start-page: 65
  year: 2013
  end-page: 75
  ident: b0310
  article-title: Clustering using levy flight cuckoo search
  publication-title: Proceedings of Seventh International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA 2012)
– volume: 64
  start-page: 381
  year: 2008
  end-page: 404
  ident: b0205
  article-title: Text document clustering based on frequent word meaning sequences
  publication-title: Data Knowl. Eng.
– volume: 5
  start-page: 762
  year: 2012
  end-page: 768
  ident: b0435
  article-title: Hierarchical clustering algorithm for land cover mapping using satellite images
  publication-title: IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing
– year: 2008
  ident: b0390
  article-title: Nature-Inspired Metaheuristic Algorithms
– reference: Data Clustering: Algorithms and Applications. CRC Press; 2014.
– reference: S. Zheng, X. Zhao, B. Zhang, H. Bu, Web document clustering research based on granular computing, in: Second International Symposium on Electronic Commerce and Security, 2009. ISECS ‘09, 2009, pp. 446–450.
– reference: J. Han, M. Kamber, A.K.H. Tung, Spatial clustering methods in data mining: a survey, in: Geographic Data Mining and Knowledge Discovery, Taylor and Francis, 2001, pp. 1–29.
– volume: 184
  start-page: 1
  year: 2012
  end-page: 19
  ident: b0290
  article-title: A hybrid recommender system for the selective dissemination of research resources in a Technology Transfer Office
  publication-title: Inf. Sci.
– volume: 6
  start-page: 443
  year: 2002
  end-page: 462
  ident: b0020
  article-title: Parallelism and evolutionary algorithms
  publication-title: IEEE Trans. Evol. Comput.
– start-page: 128
  year: 2008
  ident: b0395
  publication-title: Nature-Inspired Metaheuristic Algorithms
– volume: 1
  start-page: 330
  year: 2010
  end-page: 343
  ident: b0400
  article-title: Engineering optimisation by cuckoo search
  publication-title: Int. J. Math. Modell. Numer. Optim.
– volume: 6
  start-page: 449
  year: 2012
  end-page: 454
  ident: b0200
  article-title: Evaluation of time complexity based on max average distance for K-means clustering
  publication-title: Int. J. Security Appl.
– reference: M. Muhr, M. Granitzer, Automatic cluster number selection using a split and merge K-means approach, in: 20th International Workshop on Database and Expert Systems Application, 2009. DEXA ‘09, 2009, pp. 363–367.
– reference: S. Goel, A. Sharma, P. Bedi, Cuckoo search clustering algorithm: a novel strategy of biomimicry, in: 2011 World Congress on Information and Communication Technologies (WICT), 2011, pp. 916–921.
– volume: 28
  start-page: 965
  year: 2007
  end-page: 973
  ident: b0300
  article-title: A method for initialising the K-means clustering algorithm using kd-trees
  publication-title: Pattern Recogn. Lett.
– year: 1988
  ident: b0180
  article-title: Algorithms for clustering data
– year: 2005
  ident: b0015
  article-title: Parallel Metaheuristics: A New Class of Algorithms
– reference: H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, J. Ma, Learning to cluster web search results, in: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Sheffield, United Kingdom, 2004, pp. 210–217.
– start-page: 179
  year: 2012
  end-page: 188
  ident: b0100
  article-title: Fitness function obtained from a genetic programming approach for web document clustering using evolutionary algorithms
  publication-title: Advances in Artificial Intelligence – IBERAMIA 2012
– year: 2002
  ident: b0440
  article-title: Statistical Pattern Recognition
– start-page: 25
  year: 2006
  end-page: 71
  ident: b0430
  article-title: A survey of clustering data mining techniques
  publication-title: Grouping Multidimensional Data
– volume: 48
  start-page: 358
  year: 2012
  end-page: 373
  ident: b0060
  article-title: Evaluating subtopic retrieval methods: clustering versus diversification of search results
  publication-title: Inf. Process. Manage.
– start-page: 15
  year: 2012
  end-page: 36
  ident: b0120
  article-title: Evolutionary Algorithm Parameters and Methods to Tune Them
  publication-title: Autonomous Search
– volume: 4
  start-page: 395
  year: 2012
  end-page: 400
  ident: b0295
  article-title: Initialization for K-means clustering using voronoi diagram
  publication-title: Procedia Technol.
– reference: K. Hammouda, Web Mining: Clustering Web Documents A Preliminary Review, 2001, pp. 1–13.
– volume: 18
  start-page: 370
  year: 2009
  end-page: 391
  ident: b0225
  article-title: Harmony K-means algorithm for document clustering
  publication-title: Data Min. Knowl. Disc.
– volume: 181
  start-page: 1503
  year: 2011
  end-page: 1516
  ident: b0325
  article-title: A google wave-based fuzzy recommender system to disseminate information in University Digital Libraries 2.0
  publication-title: Inf. Sci.
– reference: S. Osiński, D. Weiss, Carrot 2: Design of a Flexible and Efficient Web Information Retrieval Framework, in: Advances in Web Intelligence, 2005, pp. 439–444.
– year: 2008
  ident: b0235
  article-title: Introduction to Information Retrieval
– reference: S.K. Smit, A.E. Eiben, Comparing parameter tuning methods for evolutionary algorithms, in: IEEE Congress on Evolutionary Computation, 2009. CEC ‘09, 2009, pp. 399–406.
– volume: 20
  start-page: 48
  year: 2005
  end-page: 54
  ident: b0285
  article-title: A concept-driven algorithm for clustering search results
  publication-title: Intell. Syst.
– year: 2011
  ident: b0215
  article-title: Parallel Genetic Algorithms: Theory and Real World Applications
– volume: 13
  start-page: 1863
  year: 2001
  end-page: 1889
  ident: b0345
  article-title: Subspace information criterion for model selection
  publication-title: Neural Comput.
– volume: 21
  start-page: 768
  year: 1965
  end-page: 769
  ident: b0130
  article-title: Cluster analysis of multivariate data: efficiency versus interpretability of classifications
  publication-title: Biometrics
– reference: X. Wei, L. Xin, G. Yihong, Document clustering based on non-negative matrix factorization, in: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, Toronto, Canada, 2003, pp. 267–273.
– reference: L. Jing, Survey of Text Clustering, 2008.
– reference: Q.H. Nguyen, Y.S. Ong, N. Krasnogor, A study on the design issues of Memetic Algorithm, in: IEEE Congress on Evolutionary Computation, 2007. CEC 2007, 2007, pp. 2390–2397.
– reference: Y. Xin-She, S. Deb, Cuckoo search via lévy flights, in: World Congress on Nature & Biologically Inspired Computing, 2009. NaBIC 2009, 2009, pp. 210–214.
– reference: Z. Zhong-Yuan, J. Zhang, Survey on the variations and applications of nonnegative matrix factorization, in: ISORA’10: The Ninth International Symposium on Operations Research and Its Applications, ORSC & APORC, Chengdu-Jiuzhaigou, China, 2010, pp. 317–323.
– reference: C. Carpineto, G. Romano, Optimal meta search results clustering, in: Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Geneva, Switzerland, 2010, pp. 170–177.
– reference: C. Cobos, M. Mendoza, E. Leon, A hyper-heuristic approach to design and tuning heuristic methods for web document clustering, in: 2011 IEEE Congress on Evolutionary Computation (CEC), IEEE, New Orleans, USA, 2011, pp. 1350–1358.
– volume: 1
  start-page: 67
  year: 1997
  end-page: 82
  ident: b0365
  article-title: No free lunch theorems for optimization
  publication-title: IEEE Trans. Evol. Comput.
– volume: 184
  start-page: 266
  year: 2012
  end-page: 281
  ident: b0010
  article-title: An architecture for a focused trend parallel Web crawler with the application of clickstream analysis
  publication-title: Inf. Sci.
– volume: 2
  start-page: 8
  year: 2011
  ident: b0350
  article-title: Improved cuckoo search algorithm for feedforward neural network training
  publication-title: Int. J. Artif. Intell. Appl. (IJAIA)
– volume: 42
  start-page: 1243
  year: 2012
  end-page: 1256
  ident: b0385
  article-title: A comparison study of validity indices on swarm-intelligence-based clustering
  publication-title: IEEE Trans. Syst. Man Cybern. B Cybern.
– reference: B. Fung, K. Wang, M. Ester, Hierarchical document clustering using frequent itemsets, in: Proceedings of the SIAM International Conference on Data Mining, 2003, pp. 59–70.
– reference: D. Zhang, Y. Dong, Semantic, hierarchical, online clustering of web search results, in: Advanced Web Technologies and Applications, 2004, pp. 69–78.
– volume: 76
  start-page: 60
  year: 2001
  end-page: 68
  ident: b0145
  article-title: A new heuristic optimization algorithm: harmony search
  publication-title: Simulation
– reference: U. Scaiella, P. Ferragina, A. Marino, M. Ciaramita, Topical clustering of search results, in: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, Seattle, Washington, USA, 2012, pp. 223–232.
– reference: X. He, J.-B. Wang, Z.-X. Zhang, Y.-R. Cai, Clustering web documents based on Multiclass spectral clustering, in: 2011 International conference on machine learning and cybernetics (ICMLC), 2011, pp. 1466–1471.
– volume: 200
  start-page: 123
  year: 2012
  end-page: 134
  ident: b0115
  article-title: Rational behavior in peer-to-peer profile obfuscation for anonymous keyword search: the multi-hop scenario
  publication-title: Inf. Sci.
– volume: 30
  start-page: 870
  year: 2009
  end-page: 876
  ident: b0075
  article-title: An online document clustering technique for short web contents
  publication-title: Pattern Recogn. Lett.
– volume: 41
  start-page: 578
  year: 1998
  end-page: 588
  ident: b0135
  article-title: How many clusters? Which clustering method? Answers via model-based cluster analysis
  publication-title: Comput. J.
– year: 2000
  ident: b0055
  article-title: Efficient and Accurate Parallel Genetic Algorithms
– reference: M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques, in: KDD workshop on text mining, ACM Boston, MA, USA., 2000, pp. 1–20.
– reference: C. Cobos, J. Andrade, W. Constain, M. Mendoza, E. León, Web document clustering based on Global-Best Harmony Search, K-means, Frequent Term Sets and Bayesian Information Criterion, in: 2010 IEEE Congress on Evolutionary Computation (CEC), IEEE, Barcelona, Spain, 2010, pp. 4637–4644.
– volume: 15
  start-page: 1294
  year: 2007
  end-page: 1312
  ident: b0210
  article-title: Semantic web content analysis: a study in proximity-based collaborative clustering
  publication-title: IEEE Trans. Fuzzy Syst.
– volume: 36
  start-page: 9095
  year: 2009
  end-page: 9104
  ident: b0335
  article-title: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures
  publication-title: Expert Syst. Appl.
– volume: 46
  start-page: 117
  year: 2010
  end-page: 130
  ident: b0125
  article-title: A probabilistic relational approach for web document clustering
  publication-title: Inf. Process. Manage.
– volume: 201
  start-page: 441
  year: 2008
  end-page: 451
  ident: b0230
  article-title: Novel meta-heuristic algorithms for clustering web documents
  publication-title: Appl. Math. Comput.
– volume: 96
  start-page: 746
  year: 2001
  end-page: 774
  ident: b0165
  article-title: Model selection and the principle of minimum description length
  publication-title: J. Am. Stat. Assoc.
– volume: 14
  start-page: 1
  year: 2008
  end-page: 37
  ident: b0370
  article-title: Top 10 algorithms in data mining
  publication-title: Knowl. Inf. Syst.
– reference: T. Matsumoto, E. Hung, Fuzzy clustering and relevance ranking of web search results with differentiating cluster label generation, in: 2010 IEEE International Conference on Fuzzy Systems (FUZZ), 2010, pp. 1–8.
– reference: R. Navigli, G. Crisafulli, Inducing word senses to improve web search result clustering, in: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Cambridge, Massachusetts, 2010, pp. 116–126.
– volume: 49
  start-page: 607
  year: 2013
  end-page: 625
  ident: b0105
  article-title: A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes
  publication-title: Inf. Process. Manage.
– volume: 31
  start-page: 264
  year: 1999
  end-page: 323
  ident: b0185
  article-title: Data clustering: a review
  publication-title: ACM Comput. Surv.
– reference: A. Bernardini, C. Carpineto, M. D’Amico, Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering, in: WI-IAT ‘09: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, 2009, pp. 206–213.
– volume: 218
  start-page: 2558
  year: 2011
  end-page: 2578
  ident: b0085
  article-title: GHS
  publication-title: Appl. Math. Comput.
– reference: F. Beil, M. Ester, X. Xu, Frequent term-based text clustering, in: KDD ‘02: International conference on Knowledge discovery and data mining (ACM SIGKDD), ACM, Edmonton, Alberta, Canada, 2002, pp. 436–442.
– reference: M. Hemalatha, D. Sathyasrinivas, Hybrid neural network model for web document clustering, in: Second International Conference on the Applications of Digital Information and Web Technologies, 2009. ICADIWT ‘09, 2009, pp. 531–538.
– volume: 41
  start-page: 1
  year: 2009
  end-page: 38
  ident: b0065
  article-title: A survey of Web clustering engines
  publication-title: ACM Comput. Surv.
– volume: 31
  start-page: 264
  year: 1999
  ident: 10.1016/j.ins.2014.05.047_b0185
  article-title: Data clustering: a review
  publication-title: ACM Comput. Surv.
  doi: 10.1145/331499.331504
– start-page: 25
  year: 2006
  ident: 10.1016/j.ins.2014.05.047_b0430
  article-title: A survey of clustering data mining techniques
– ident: 10.1016/j.ins.2014.05.047_b0175
  doi: 10.1109/ICADIWT.2009.5273918
– ident: 10.1016/j.ins.2014.05.047_b0305
  doi: 10.1145/2124295.2124324
– volume: 20
  start-page: 48
  year: 2005
  ident: 10.1016/j.ins.2014.05.047_b0285
  article-title: A concept-driven algorithm for clustering search results
  publication-title: Intell. Syst.
  doi: 10.1109/MIS.2005.38
– volume: 181
  start-page: 1503
  year: 2011
  ident: 10.1016/j.ins.2014.05.047_b0325
  article-title: A google wave-based fuzzy recommender system to disseminate information in University Digital Libraries 2.0
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2011.01.012
– ident: 10.1016/j.ins.2014.05.047_b0360
  doi: 10.1145/860435.860485
– start-page: 179
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0100
  article-title: Fitness function obtained from a genetic programming approach for web document clustering using evolutionary algorithms
– ident: 10.1016/j.ins.2014.05.047_b0025
– ident: 10.1016/j.ins.2014.05.047_b0415
  doi: 10.1109/ISECS.2009.16
– ident: 10.1016/j.ins.2014.05.047_b0330
  doi: 10.1109/CEC.2009.4982974
– volume: 13
  start-page: 1863
  year: 2001
  ident: 10.1016/j.ins.2014.05.047_b0345
  article-title: Subspace information criterion for model selection
  publication-title: Neural Comput.
  doi: 10.1162/08997660152469387
– start-page: 65
  year: 2013
  ident: 10.1016/j.ins.2014.05.047_b0310
  article-title: Clustering using levy flight cuckoo search
– ident: 10.1016/j.ins.2014.05.047_b0150
  doi: 10.1109/WICT.2011.6141370
– volume: 201
  start-page: 441
  year: 2008
  ident: 10.1016/j.ins.2014.05.047_b0230
  article-title: Novel meta-heuristic algorithms for clustering web documents
  publication-title: Appl. Math. Comput.
  doi: 10.1016/j.amc.2007.12.058
– start-page: 15
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0120
  article-title: Evolutionary Algorithm Parameters and Methods to Tune Them
– ident: 10.1016/j.ins.2014.05.047_b0090
  doi: 10.1109/CEC.2011.5949773
– volume: 64
  start-page: 381
  year: 2008
  ident: 10.1016/j.ins.2014.05.047_b0205
  article-title: Text document clustering based on frequent word meaning sequences
  publication-title: Data Knowl. Eng.
  doi: 10.1016/j.datak.2007.08.001
– year: 2000
  ident: 10.1016/j.ins.2014.05.047_b0055
– volume: 14
  start-page: 1
  year: 2008
  ident: 10.1016/j.ins.2014.05.047_b0370
  article-title: Top 10 algorithms in data mining
  publication-title: Knowl. Inf. Syst.
  doi: 10.1007/s10115-007-0114-2
– volume: 47
  start-page: 374
  year: 2009
  ident: 10.1016/j.ins.2014.05.047_b0110
  article-title: Density link-based methods for clustering web pages
  publication-title: Decis. Support Syst.
  doi: 10.1016/j.dss.2009.04.002
– volume: 200
  start-page: 123
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0115
  article-title: Rational behavior in peer-to-peer profile obfuscation for anonymous keyword search: the multi-hop scenario
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2012.02.067
– ident: 10.1016/j.ins.2014.05.047_b0375
  doi: 10.1109/ICMLC.2005.1527337
– ident: 10.1016/j.ins.2014.05.047_b0280
  doi: 10.1007/11495772_68
– ident: 10.1016/j.ins.2014.05.047_b0410
  doi: 10.1007/978-3-540-24655-8_8
– volume: 46
  start-page: 117
  year: 2010
  ident: 10.1016/j.ins.2014.05.047_b0125
  article-title: A probabilistic relational approach for web document clustering
  publication-title: Inf. Process. Manage.
  doi: 10.1016/j.ipm.2009.08.003
– ident: 10.1016/j.ins.2014.05.047_b0170
  doi: 10.1109/ICMLC.2011.6017004
– volume: 21
  start-page: 768
  year: 1965
  ident: 10.1016/j.ins.2014.05.047_b0130
  article-title: Cluster analysis of multivariate data: efficiency versus interpretability of classifications
  publication-title: Biometrics
– ident: 10.1016/j.ins.2014.05.047_b0070
  doi: 10.1145/1835449.1835480
– volume: 127
  start-page: 30
  year: 2014
  ident: 10.1016/j.ins.2014.05.047_b0255
  article-title: Evolutionary k-means for distributed data sets
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2013.05.046
– ident: 10.1016/j.ins.2014.05.047_b0095
  doi: 10.1109/CEC.2010.5586016
– volume: 41
  start-page: 1
  year: 2009
  ident: 10.1016/j.ins.2014.05.047_b0065
  article-title: A survey of Web clustering engines
  publication-title: ACM Comput. Surv.
  doi: 10.1145/1541880.1541884
– volume: 62
  start-page: 504
  year: 2007
  ident: 10.1016/j.ins.2014.05.047_b0245
  article-title: A new algorithm for clustering search results
  publication-title: Data Knowl. Eng.
  doi: 10.1016/j.datak.2006.10.006
– volume: 48
  start-page: 358
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0060
  article-title: Evaluating subtopic retrieval methods: clustering versus diversification of search results
  publication-title: Inf. Process. Manage.
  doi: 10.1016/j.ipm.2011.08.004
– ident: 10.1016/j.ins.2014.05.047_b0270
– volume: 28
  start-page: 965
  year: 2007
  ident: 10.1016/j.ins.2014.05.047_b0300
  article-title: A method for initialising the K-means clustering algorithm using kd-trees
  publication-title: Pattern Recogn. Lett.
  doi: 10.1016/j.patrec.2007.01.001
– volume: 36
  start-page: 69
  year: 2011
  ident: 10.1016/j.ins.2014.05.047_b0195
  article-title: An effective web document clustering algorithm based on bisection and merge
  publication-title: Artif. Intell. Rev.
  doi: 10.1007/s10462-011-9203-4
– volume: 41
  start-page: 578
  year: 1998
  ident: 10.1016/j.ins.2014.05.047_b0135
  article-title: How many clusters? Which clustering method? Answers via model-based cluster analysis
  publication-title: Comput. J.
  doi: 10.1093/comjnl/41.8.578
– year: 2002
  ident: 10.1016/j.ins.2014.05.047_b0425
– volume: 96
  start-page: 746
  year: 2001
  ident: 10.1016/j.ins.2014.05.047_b0165
  article-title: Model selection and the principle of minimum description length
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1198/016214501753168398
– year: 1988
  ident: 10.1016/j.ins.2014.05.047_b0180
– ident: 10.1016/j.ins.2014.05.047_b0190
– volume: 184
  start-page: 1
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0290
  article-title: A hybrid recommender system for the selective dissemination of research resources in a Technology Transfer Office
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2011.08.026
– volume: 1
  start-page: 67
  year: 1997
  ident: 10.1016/j.ins.2014.05.047_b0365
  article-title: No free lunch theorems for optimization
  publication-title: IEEE Trans. Evol. Comput.
  doi: 10.1109/4235.585893
– volume: 76
  start-page: 60
  year: 2001
  ident: 10.1016/j.ins.2014.05.047_b0145
  article-title: A new heuristic optimization algorithm: harmony search
  publication-title: Simulation
  doi: 10.1177/003754970107600201
– ident: 10.1016/j.ins.2014.05.047_b0240
  doi: 10.1109/FUZZY.2010.5584771
– volume: 184
  start-page: 266
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0010
  article-title: An architecture for a focused trend parallel Web crawler with the application of clickstream analysis
  publication-title: Inf. Sci.
  doi: 10.1016/j.ins.2011.08.022
– volume: 30
  start-page: 870
  year: 2009
  ident: 10.1016/j.ins.2014.05.047_b0075
  article-title: An online document clustering technique for short web contents
  publication-title: Pattern Recogn. Lett.
  doi: 10.1016/j.patrec.2009.04.001
– year: 1999
  ident: 10.1016/j.ins.2014.05.047_b0030
– year: 2008
  ident: 10.1016/j.ins.2014.05.047_b0390
– ident: 10.1016/j.ins.2014.05.047_b0155
– ident: 10.1016/j.ins.2014.05.047_b0140
  doi: 10.1137/1.9781611972733.6
– ident: 10.1016/j.ins.2014.05.047_b0260
– volume: 4
  start-page: 395
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0295
  article-title: Initialization for K-means clustering using voronoi diagram
  publication-title: Procedia Technol.
  doi: 10.1016/j.protcy.2012.05.061
– volume: 42
  start-page: 1243
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0385
  article-title: A comparison study of validity indices on swarm-intelligence-based clustering
  publication-title: IEEE Trans. Syst. Man Cybern. B Cybern.
  doi: 10.1109/TSMCB.2012.2188509
– start-page: 128
  year: 2008
  ident: 10.1016/j.ins.2014.05.047_b0395
  publication-title: Nature-Inspired Metaheuristic Algorithms
– ident: 10.1016/j.ins.2014.05.047_b0050
  doi: 10.1109/WI-IAT.2009.37
– ident: 10.1016/j.ins.2014.05.047_b0160
  doi: 10.4324/9780203468029_chapter_8
– ident: 10.1016/j.ins.2014.05.047_b0380
– year: 2005
  ident: 10.1016/j.ins.2014.05.047_b0015
– volume: 15
  start-page: 1294
  year: 2007
  ident: 10.1016/j.ins.2014.05.047_b0210
  article-title: Semantic web content analysis: a study in proximity-based collaborative clustering
  publication-title: IEEE Trans. Fuzzy Syst.
  doi: 10.1109/TFUZZ.2006.889970
– volume: 18
  start-page: 370
  year: 2009
  ident: 10.1016/j.ins.2014.05.047_b0225
  article-title: Harmony K-means algorithm for document clustering
  publication-title: Data Min. Knowl. Disc.
  doi: 10.1007/s10618-008-0123-0
– ident: 10.1016/j.ins.2014.05.047_b0340
– volume: 49
  start-page: 607
  year: 2013
  ident: 10.1016/j.ins.2014.05.047_b0105
  article-title: A hybrid system of pedagogical pattern recommendations based on singular value decomposition and variable data attributes
  publication-title: Inf. Process. Manage.
  doi: 10.1016/j.ipm.2012.12.002
– volume: 6
  start-page: 443
  year: 2002
  ident: 10.1016/j.ins.2014.05.047_b0020
  article-title: Parallelism and evolutionary algorithms
  publication-title: IEEE Trans. Evol. Comput.
  doi: 10.1109/TEVC.2002.800880
– ident: 10.1016/j.ins.2014.05.047_b0080
  doi: 10.1109/CEC.2010.5586109
– ident: 10.1016/j.ins.2014.05.047_b0420
– volume: 6
  start-page: 449
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0200
  article-title: Evaluation of time complexity based on max average distance for K-means clustering
  publication-title: Int. J. Security Appl.
– year: 2008
  ident: 10.1016/j.ins.2014.05.047_b0235
– year: 2002
  ident: 10.1016/j.ins.2014.05.047_b0440
– ident: 10.1016/j.ins.2014.05.047_b0035
  doi: 10.1145/775047.775110
– year: 2011
  ident: 10.1016/j.ins.2014.05.047_b0215
– ident: 10.1016/j.ins.2014.05.047_b0250
  doi: 10.1109/DEXA.2009.39
– volume: 36
  start-page: 9095
  year: 2009
  ident: 10.1016/j.ins.2014.05.047_b0335
  article-title: Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2008.12.046
– volume: 1
  start-page: 330
  year: 2010
  ident: 10.1016/j.ins.2014.05.047_b0400
  article-title: Engineering optimisation by cuckoo search
  publication-title: Int. J. Math. Modell. Numer. Optim.
– volume: 2
  start-page: 8
  year: 2011
  ident: 10.1016/j.ins.2014.05.047_b0350
  article-title: Improved cuckoo search algorithm for feedforward neural network training
  publication-title: Int. J. Artif. Intell. Appl. (IJAIA)
– volume: 11
  start-page: 583
  year: 2007
  ident: 10.1016/j.ins.2014.05.047_b0220
  article-title: An overview of clustering methods
  publication-title: Intell. Data Anal.
  doi: 10.3233/IDA-2007-11602
– ident: 10.1016/j.ins.2014.05.047_b0265
  doi: 10.1109/CEC.2007.4424770
– volume: 6
  start-page: 861
  year: 2013
  ident: 10.1016/j.ins.2014.05.047_b0315
  article-title: Crop stage classification of hyperspectral data using unsupervised techniques
  publication-title: IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing
  doi: 10.1109/JSTARS.2012.2217941
– volume: 5
  start-page: 762
  year: 2012
  ident: 10.1016/j.ins.2014.05.047_b0435
  article-title: Hierarchical clustering algorithm for land cover mapping using satellite images
  publication-title: IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing
  doi: 10.1109/JSTARS.2012.2187432
– ident: 10.1016/j.ins.2014.05.047_b0005
– ident: 10.1016/j.ins.2014.05.047_b0405
  doi: 10.1145/1008992.1009030
– volume: 218
  start-page: 2558
  year: 2011
  ident: 10.1016/j.ins.2014.05.047_b0085
  article-title: GHS+LEM: global-best harmony search using learnable evolution models
  publication-title: Appl. Math. Comput.
  doi: 10.1016/j.amc.2011.07.073
– ident: 10.1016/j.ins.2014.05.047_b0275
  doi: 10.1007/11735106_16
SSID ssj0004766
Score 2.4036255
Snippet The clustering of web search results – or web document clustering – has become a very interesting research area among academic and scientific communities...
The clustering of web search results - or web document clustering - has become a very interesting research area among academic and scientific communities...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 248
SubjectTerms Algorithms
Balanced Bayesian Information Criterion
Balancing
Bayesian analysis
Clustering
Clustering of web result
Clusters
Criteria
Cuckoo search algorithm
Heuristic methods
k-Mean
Searching
Web document clustering
Title Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion
URI https://dx.doi.org/10.1016/j.ins.2014.05.047
https://www.proquest.com/docview/1642242025
Volume 281
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier ScienceDirect
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ]
  customDbUrl:
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1872-6291
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004766
  issn: 0020-0255
  databaseCode: AKRWK
  dateStart: 19681201
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA6iFz2IT3wTQTwI1bTNpu1RF2VV9KTgLeRVXV3bVduDF3-7M23qC_HgrS2TtsxMZr4h8yBkR5tEOOXiIAm1CLgQKsiMEAGgX61CbZluTvAvLsXgmp_d9G4mSL-rhcG0Sm_7W5veWGv_5MBz82A8HGKNb9QgYggRwCkxjNs5T3CKwf7bZ5oHT9rzSgyTkLo72WxyvIYFduwOedO8Eyes_O6bfljpxvWczJFZjxnpYftb82TCFQtk5ksnwQWy6esP6C71BUbIcOp37iJ56o9qbIkAxLTMKdhO2uo4hXC7HlUvFN2ZpbAGECE1tXkoy45EjW7L52F190hVYekRJkMC3-Di1WEJ5rcv4ugE-ExZLJHrk-Or_iDw0xYCE2eiCiJnUwHhjDUKIIqI45yxPOG5SgERJEbkaWKznoF4TAPm00YxyyMnUgh5UqUiFi-TyaIs3AqhOs1MxrI0ykXEDTc6BCBnVc_i1DzH3SphHZ-l8a3IcSLGSHY5Z_cSRCNRNJL1JIhmlex9LBm3fTj-Iuad8OQ3ZZLgJ_5att0JWsImw5MTVbiyfpEQUwLUiUB_1v736nUyjXfo9EK2QSar59ptApqp9Fajrltk6vD0fHD5DgS69EE
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Nb9MwFLemcmA7IBiglTEw0sQBKauTOE5yhIqqQNvTJvVm-SssoyTdlhy48LfzXuLAhlAP3KLkOYn87Pd-P_l9EHKqTSqccnGQhloEXAgV5EaIANCvVqG2THcn-MuVmF_wz-tkvUemQy4MhlV629_b9M5a-zsTP5uTbVlijm_UIWKgCOCUGPD2BzyJUmRgZz__xHnwtD-wRJ6E4sPRZhfkVVZYsjvkXfVObLHyb-f0l5nufM_sMXnkQSN93__XE7LnqkNycKeU4CE58QkI9C31GUY449Rv3afkerppsSYCCNO6oGA8ab_IKfDtdtPcUvRnlsIYgITUtOZbXQ8iavO1vimby-9UVZZ-wGhImDi4-OEwB_PeF7F3Anymrp6Ri9nH8-k88O0WAhPnogkiZzMBfMYaBRhFxHHBWJHyQmUACVIjiiy1eWKAkGkAfdooZnnkRAacJ1MqYvFzMqrqyh0RqrPc5CzPokJE3HCjQ0ByViUW2-Y57saEDfMsja9Fji0xNnIIOruSoBqJqpEskaCaMXn3e8i2L8SxS5gPypP3VpMER7Fr2JtB0RJ2GR6dqMrV7a0EUglYJ4L18-L_Xv2aPJyfLxdy8Wn15Zjs4xP0gCF7SUbNTetOANo0-lW3dH8BrN711g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clustering+of+web+search+results+based+on+the+cuckoo+search+algorithm+and+Balanced+Bayesian+Information+Criterion&rft.jtitle=Information+sciences&rft.au=Cobos%2C+Carlos&rft.au=Mu%C3%B1oz-Collazos%2C+Henry&rft.au=Urbano-Mu%C3%B1oz%2C+Richar&rft.au=Mendoza%2C+Martha&rft.date=2014-10-10&rft.issn=0020-0255&rft.volume=281&rft.spage=248&rft.epage=264&rft_id=info:doi/10.1016%2Fj.ins.2014.05.047&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_ins_2014_05_047
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0020-0255&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0020-0255&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0020-0255&client=summon