Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability

•Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering alternatives.•A domain transferable feature engineering approach is developed for diverse datasets.•Intelligent feature engineering can improve pe...

Full description

Saved in:
Bibliographic Details
Published inComputers & industrial engineering Vol. 165; p. 107959
Main Authors Ghiassi, M., Lee, Sean, Gaikwad, Swati Ramesh
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.03.2022
Subjects
Online AccessGet full text
ISSN0360-8352
1879-0550
DOI10.1016/j.cie.2022.107959

Cover

Abstract •Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering alternatives.•A domain transferable feature engineering approach is developed for diverse datasets.•Intelligent feature engineering can improve performance regardless of tools used. Two notable applications of text classification are sentiment analysis and spam filtering. Traditional machine learning approaches to text classification are often complex, non-transferrable, and require supervision. This paper introduces an unsupervised approach to text classification which is relatively simple and transfers between problem domains, while providing accuracy comparable or better than established alternatives. We present an integrated solution which combines a new clustering algorithm, Yet Another Clustering Algorithm (YAC2), with a domain transferrable feature engineering approach for Twitter sentiment analysis and spam filtering of YouTube comments. We evaluate the effectiveness of this integrated solution for Twitter sentiment analysis using three datasets: Starbucks, Verizon, and Southwest Airlines. YouTube spam filtering is evaluated using four datasets: Psy, LMFAO, Shakira,and Katy Perry. We compare the results with established clusteringsolutions: KNN, Spectral, and DBSCAN. Our integrated solution performs better than all the alternatives for sentiment analysis. For spam filtering, YAC2 and KNN perform within 1% of each other and far better than Spectral and DBSCAN for all datasets. Additionally, our feature engineering approach improves accuracy compared to using a traditional method, while significantly reducing model dimensionality, matrix sparsity and providing transferability across the datasets tested.
AbstractList •Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering alternatives.•A domain transferable feature engineering approach is developed for diverse datasets.•Intelligent feature engineering can improve performance regardless of tools used. Two notable applications of text classification are sentiment analysis and spam filtering. Traditional machine learning approaches to text classification are often complex, non-transferrable, and require supervision. This paper introduces an unsupervised approach to text classification which is relatively simple and transfers between problem domains, while providing accuracy comparable or better than established alternatives. We present an integrated solution which combines a new clustering algorithm, Yet Another Clustering Algorithm (YAC2), with a domain transferrable feature engineering approach for Twitter sentiment analysis and spam filtering of YouTube comments. We evaluate the effectiveness of this integrated solution for Twitter sentiment analysis using three datasets: Starbucks, Verizon, and Southwest Airlines. YouTube spam filtering is evaluated using four datasets: Psy, LMFAO, Shakira,and Katy Perry. We compare the results with established clusteringsolutions: KNN, Spectral, and DBSCAN. Our integrated solution performs better than all the alternatives for sentiment analysis. For spam filtering, YAC2 and KNN perform within 1% of each other and far better than Spectral and DBSCAN for all datasets. Additionally, our feature engineering approach improves accuracy compared to using a traditional method, while significantly reducing model dimensionality, matrix sparsity and providing transferability across the datasets tested.
ArticleNumber 107959
Author Ghiassi, M.
Gaikwad, Swati Ramesh
Lee, Sean
Author_xml – sequence: 1
  givenname: M.
  orcidid: 0000-0002-5748-7513
  surname: Ghiassi
  fullname: Ghiassi, M.
  email: mghiassi@scu.edu
  organization: Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, United States
– sequence: 2
  givenname: Sean
  orcidid: 0000-0002-1810-3468
  surname: Lee
  fullname: Lee, Sean
  email: sean@ciitizen.com
  organization: Ciitizen Corp., 3000 El Camino Real, 3 Palo Alto Square, Palo Alto, CA 94306, United States
– sequence: 3
  givenname: Swati Ramesh
  surname: Gaikwad
  fullname: Gaikwad, Swati Ramesh
  email: swat.gkd@gmail.com
  organization: Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, United States
BookMark eNp9kM9OwzAMxiMEEtvgAbjlBTqctE1bcZom_kmTOAAHLkRp6myZunRKMtDenkzbicMun2M7P8v-xuTSDQ4JuWMwZcDE_XqqLU45cJ7yqimbCzJiddVkUJZwSUaQC8jqvOTXZBzCGgCKsmEj8v2OLtpNEqqc6vfBhvToaNiqDTW2j-itW9JdOGhcIf2azTnV_S6cOqpfDt7G1Yb-JqXRKxcMetXa3sb9Dbkyqg94e4oT8vn0-DF_yRZvz6_z2SLTnFcxw0JBjapALHlrNC8BeG50m5taIIiGtTnwDsCYArhSjVBMtI3uhKlElyueT0h1nKv9EIJHI7WNKtrBpYVsLxnIg01yneooDzbJo02JZP_Irbcb5fdnmYcjg-mkH4tehvTFaeysRx1lN9gz9B_1G4QT
CitedBy_id crossref_primary_10_1002_spy2_402
crossref_primary_10_1016_j_desal_2023_116482
crossref_primary_10_1155_2022_7183207
crossref_primary_10_3390_electronics13071346
crossref_primary_10_1016_j_dajour_2023_100390
crossref_primary_10_1016_j_cie_2024_110142
crossref_primary_10_3390_electronics13112034
crossref_primary_10_1016_j_cie_2023_109693
Cites_doi 10.1111/j.1540-6261.2007.01232.x
10.5772/6083
10.1109/MSP.2014.2377273
10.1162/COLI_a_00049
10.1109/TETC.2014.2330519
10.1016/j.knosys.2016.06.009
10.1145/2436256.2436274
10.1016/j.eswa.2013.01.001
10.1109/ICAwST.2019.8923218
10.1016/j.eswa.2013.05.057
10.5120/ijca2016912291
10.1016/j.eswa.2018.04.006
10.1111/j.1467-8640.2006.00277.x
ContentType Journal Article
Copyright 2022 Elsevier Ltd
Copyright_xml – notice: 2022 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.cie.2022.107959
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
EISSN 1879-0550
ExternalDocumentID 10_1016_j_cie_2022_107959
S0360835222000298
GroupedDBID --K
--M
-~X
.DC
.~1
0R~
1B1
1RT
1~.
1~5
29F
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKG
AABNK
AACTN
AAEDT
AAEDW
AAFWJ
AAIAV
AAIKC
AAIKJ
AAKOC
AALRI
AAMNW
AAOAW
AAQFI
AAQXK
AARIN
AAXUO
ABAOU
ABMAC
ABUCO
ABXDB
ABYKQ
ACAZW
ACDAQ
ACGFO
ACGFS
ACNCT
ACNNM
ACRLP
ADBBV
ADEZE
ADGUI
ADMUD
ADRHT
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AIEXJ
AIGVJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
APLSM
ARUGR
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BKOMP
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
HAMUX
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
LX9
LY1
LY7
M41
MHUIS
MO0
MS~
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
R2-
RIG
RNS
ROL
RPZ
RXW
SBC
SDF
SDG
SDP
SDS
SES
SET
SEW
SPC
SPCBC
SSB
SSD
SST
SSW
SSZ
T5K
TAE
TN5
WUQ
XPP
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABJNI
ABWVN
ACLOT
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
CITATION
EFKBS
~HD
ID FETCH-LOGICAL-c227t-e4a08ea4ee52bfc250023fcb3f86e0691b302d00ff402aa96a16b9cd6f76d3a23
IEDL.DBID .~1
ISSN 0360-8352
IngestDate Thu Apr 24 22:57:14 EDT 2025
Thu Oct 09 00:36:28 EDT 2025
Fri Feb 23 02:40:49 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Sentiment analysis
Spam filtering
Transferability
Clustering Analysis
YAC2
Machine learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c227t-e4a08ea4ee52bfc250023fcb3f86e0691b302d00ff402aa96a16b9cd6f76d3a23
ORCID 0000-0002-5748-7513
0000-0002-1810-3468
ParticipantIDs crossref_citationtrail_10_1016_j_cie_2022_107959
crossref_primary_10_1016_j_cie_2022_107959
elsevier_sciencedirect_doi_10_1016_j_cie_2022_107959
PublicationCentury 2000
PublicationDate March 2022
2022-03-00
PublicationDateYYYYMMDD 2022-03-01
PublicationDate_xml – month: 03
  year: 2022
  text: March 2022
PublicationDecade 2020
PublicationTitle Computers & industrial engineering
PublicationYear 2022
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Khanali, Vaziri (b0140) 2016; 155
Alberto, T., & Almeida, T. (2015). UCI Machine Learning Repository.
Taboada, Brook, Tofiloski, Voll, Stede (b0245) 2011; 37
Kennedy, Inkpen (b0135) 2006; 22
Ghiassi, Saidane, Oswal (b0105) 2020
Komodakis, Pesquet (b0150) 2015; 32
Patel, Butani, Patel, Sawant (b0195) 2017; 3
Kim, Hovy (b0145) 2004
Polanyi, Zaenen (b0200) 2006
Ghiassi, Skinner, Zimbra (b0110) 2013; 40
Nazari, Kang, Asharif, Sung, Ogawa (b0180) 2015; 148–152
Accessed: 2020.05.02.
Glorot, Bordes, Bengio (b0120) 2011
Oliveira, Cortez, Areal (b0185) 2014
Aue, Gamon (b9000) 2005
Feldman (b0090) 2013; 56
Bhowmick, Hazarika (b0040) 2018
Shi, Gao, Liu (b0240) 2018
Ghiassi, Lee (b0100) 2018
Sharma, Dey (b0235) 2012
Agarwal, Xie, Vovsha, Rambow, Passonneau (b0015) 2011
http://doi.org/10.1109/ICAwST.2019.8923218.
http://doi.org/10.5772/6083.
.
Drikvandi, Lawal (b0075) 2020
Tiruveedhula, Rani, Narayana (b0260) 2016; 9
Tetlock (b0255) 2007; 62
Donaldson, Martin, de Bruijn (b0065) 2003; 4
Manning, Raghavan, Schütze (b0170) 2008
Saif, He, Alani (b0225) 2012
Rifkin, Klautau (b0215) 2004; 5
Andreevskaia, Bergler (b0035) 2008
Abdulhamid, Latiff, Chiroma, Osho, Abdul-Salaam, Abubakar (b0005) 2017; 5
Kajanan, S., Shafeeq Bin Mohd Shariff, A., Datta, A., Dutta, K., & Paul, D. (2011). Twitter post filter for mobile applications.
Fahad, Alshatri, Tari, Alamri, Khalil, Zomaya (b0080) 2014; 2
Kontopoulos, Berberidis, Dergiades, Bassiliades (b0155) 2013; 40
Kyriakopoulou, A. (2008). Text Classification Aided by Clustering: a Literature Review.
Dredze, Crammer (b0070) 2008
Tan, Wu, Tang, Cheng (b0250) 2007
(pp. 1–6).
Blitzer, Dredze, Pereira (b0045) 2007
Dave, Lawrence, Pennock (b0050) 2003
Jiang, Yu, Zhou, Liu, Zhao (b0125) 2011
Turney (b0265) 2002
Porter, M. (2006). The Porter Stemming Algorithm.
Akadi, Ouarighi, Aboutajdine (b0020) 2008; 8
Mansour, R., Refaei, N., Gamon, M., Abdul-Hamid, A., & Sami, K. (2013). Revisiting the old kitchen sink: Do we need sentiment domain adaptation?
Samdani, Yih (b0230) 2011
Saif, He, Alani (b0220) 2012
Ghosh, K., Banerjee, A., Chatterjee, S., Sen, S. (2019). Imbalanced Twitter Sentiment Analysis using Minority Oversampling
Poria, Cambria, Gelbukh (b0210) 2016; 108
Liu (b0165) 2015
Ding, Liu, Yu (b0060) 2008
Alberto, Lochter, Almeida (b0025) 2015
Pang, Lee, Vaithyanathan (b0190) 2002
Denny, Spirling (b0055) 2017; 1
Gamon (b0095) 2004
Abbasi, A., Hassan, A., and Dhar, M. (2014). Benchmarking Twitter sentiment analysis tools.
(pp. 420–427).
Irvine
Foozy, Shamala, Suradi (b0085) 2018; 5
Saif (10.1016/j.cie.2022.107959_b0220) 2012
10.1016/j.cie.2022.107959_b0205
Fahad (10.1016/j.cie.2022.107959_b0080) 2014; 2
Khanali (10.1016/j.cie.2022.107959_b0140) 2016; 155
Alberto (10.1016/j.cie.2022.107959_b0025) 2015
Andreevskaia (10.1016/j.cie.2022.107959_b0035) 2008
Ghiassi (10.1016/j.cie.2022.107959_b0100) 2018
Feldman (10.1016/j.cie.2022.107959_b0090) 2013; 56
10.1016/j.cie.2022.107959_b0160
Glorot (10.1016/j.cie.2022.107959_b0120) 2011
Taboada (10.1016/j.cie.2022.107959_b0245) 2011; 37
Gamon (10.1016/j.cie.2022.107959_b0095) 2004
Tan (10.1016/j.cie.2022.107959_b0250) 2007
Polanyi (10.1016/j.cie.2022.107959_b0200) 2006
Ding (10.1016/j.cie.2022.107959_b0060) 2008
Shi (10.1016/j.cie.2022.107959_b0240) 2018
Agarwal (10.1016/j.cie.2022.107959_b0015) 2011
Manning (10.1016/j.cie.2022.107959_b0170) 2008
Tetlock (10.1016/j.cie.2022.107959_b0255) 2007; 62
Poria (10.1016/j.cie.2022.107959_b0210) 2016; 108
Patel (10.1016/j.cie.2022.107959_b0195) 2017; 3
Abdulhamid (10.1016/j.cie.2022.107959_b0005) 2017; 5
10.1016/j.cie.2022.107959_b0030
Tiruveedhula (10.1016/j.cie.2022.107959_b0260) 2016; 9
10.1016/j.cie.2022.107959_b0115
Denny (10.1016/j.cie.2022.107959_b0055) 2017; 1
Oliveira (10.1016/j.cie.2022.107959_b0185) 2014
Donaldson (10.1016/j.cie.2022.107959_b0065) 2003; 4
Pang (10.1016/j.cie.2022.107959_b0190) 2002
Ghiassi (10.1016/j.cie.2022.107959_b0110) 2013; 40
Komodakis (10.1016/j.cie.2022.107959_b0150) 2015; 32
Jiang (10.1016/j.cie.2022.107959_b0125) 2011
Liu (10.1016/j.cie.2022.107959_b0165) 2015
Turney (10.1016/j.cie.2022.107959_b0265) 2002
Dredze (10.1016/j.cie.2022.107959_b0070) 2008
Akadi (10.1016/j.cie.2022.107959_b0020) 2008; 8
Aue (10.1016/j.cie.2022.107959_b9000) 2005
Saif (10.1016/j.cie.2022.107959_b0225) 2012
Drikvandi (10.1016/j.cie.2022.107959_b0075) 2020
Rifkin (10.1016/j.cie.2022.107959_b0215) 2004; 5
Foozy (10.1016/j.cie.2022.107959_b0085) 2018; 5
Kim (10.1016/j.cie.2022.107959_b0145) 2004
Kennedy (10.1016/j.cie.2022.107959_b0135) 2006; 22
Bhowmick (10.1016/j.cie.2022.107959_b0040) 2018
Sharma (10.1016/j.cie.2022.107959_b0235) 2012
Samdani (10.1016/j.cie.2022.107959_b0230) 2011
Kontopoulos (10.1016/j.cie.2022.107959_b0155) 2013; 40
10.1016/j.cie.2022.107959_b0130
Ghiassi (10.1016/j.cie.2022.107959_b0105) 2020
Blitzer (10.1016/j.cie.2022.107959_b0045) 2007
Dave (10.1016/j.cie.2022.107959_b0050) 2003
Nazari (10.1016/j.cie.2022.107959_b0180) 2015; 148–152
10.1016/j.cie.2022.107959_b0010
10.1016/j.cie.2022.107959_b0175
References_xml – reference: (pp. 420–427).
– year: 2020
  ident: b0105
  article-title: YAC2: An α-proximity based clustering algorithm
  publication-title: Expert Systems with Applications
– start-page: 1
  year: 2006
  end-page: 10
  ident: b0200
  article-title: Contextual valence shifters
  publication-title: Computing attitude and affect in text: Theory and applications
– volume: 148–152
  year: 2015
  ident: b0180
  article-title: A New Hierarchical Clustering Algorithm
  publication-title: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)
– year: 2018
  ident: b0240
  article-title: A Hybrid Sampling Method Based on Safe Screening for Imbalanced Datasets with Sparse Structure
  publication-title: IEEE
– volume: 3
  start-page: 2349
  year: 2017
  end-page: 6010
  ident: b0195
  article-title: Literature Survey on Sentiment Analysis of Twitter Data using Machine Learning Approaches
  publication-title: International Journal for Innovation Research in Science & Technology (IJIRST)
– reference: Ghosh, K., Banerjee, A., Chatterjee, S., Sen, S. (2019). Imbalanced Twitter Sentiment Analysis using Minority Oversampling,
– volume: 62
  start-page: 1139
  year: 2007
  end-page: 1168
  ident: b0255
  article-title: Giving content to investor sentiment: The role of media in the stock market
  publication-title: The Journal of Finance
– volume: 9
  start-page: 1
  year: 2016
  end-page: 12
  ident: b0260
  article-title: A Survey on Clustering Techniques for Big Data Mining
  publication-title: Indian Journal of Science and Technology
– year: 2002
  ident: b0265
  article-title: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews
  publication-title: Proceedings of the 40th annual meeting of the
– reference: Alberto, T., & Almeida, T. (2015). UCI Machine Learning Repository.
– volume: 56
  start-page: 82
  year: 2013
  end-page: 89
  ident: b0090
  article-title: Techniques and applications for sentiment analysis
  publication-title: Communications of the ACM
– start-page: 151
  year: 2011
  end-page: 160
  ident: b0125
  article-title: Target-dependent twitter sentiment classification
  publication-title: Proceedings of the 49th annual meeting of the association for computational linguistics
– reference: Abbasi, A., Hassan, A., and Dhar, M. (2014). Benchmarking Twitter sentiment analysis tools.
– year: 2005
  ident: b9000
  article-title: Customizing sentiment classifiers to new domains: A case study
  publication-title: In Proceedings of recent advances in natural language processing
– volume: 108
  start-page: 42
  year: 2016
  end-page: 49
  ident: b0210
  article-title: Aspect extraction for opinion mining with a deep convolutional neural network
  publication-title: Knowledge Based Systems
– volume: 32
  start-page: 31
  year: 2015
  end-page: 54
  ident: b0150
  article-title: Playing with Duality: An overview of recent primal-dual approaches for solving large-scale optimization problems
  publication-title: IEEE Signal Processing Magazine
– year: 2020
  ident: b0075
  article-title: Sparse Principal Component Analysis for Natural Language Processing
  publication-title: Annals of Data Science
– volume: 37
  start-page: 267
  year: 2011
  end-page: 307
  ident: b0245
  article-title: Lexicon-based methods for sentiment analysis
  publication-title: Computational Linguistics
– volume: 5
  start-page: 15650
  year: 2017
  end-page: 15666
  ident: b0005
  article-title: A Review on Mobile SMS Spam Filtering Techniques
  publication-title: IEEE
– start-page: 583
  year: 2018
  end-page: 590
  ident: b0040
  article-title: E-Mail Spam Filtering: A Review of Techniques and Trends
  publication-title: book: Advances in Electronics, Communication and Computing, (443)
– year: 2008
  ident: b0170
  article-title: Introduction to Information Retrieval
– volume: 2
  start-page: 267
  year: 2014
  end-page: 279
  ident: b0080
  article-title: A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis
  publication-title: IEEE Transactions on Emerging Topics in Computing
– volume: 40
  start-page: 4065
  year: 2013
  end-page: 4074
  ident: b0155
  article-title: Ontology-based sentiment analysis of twitter posts
  publication-title: Expert Systems with Applications
– start-page: 440
  year: 2007
  end-page: 447
  ident: b0045
  article-title: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification
  publication-title: Proceedings of the annual meetings of the Association of Computational Linguistics (ACL)
– reference: Porter, M. (2006). The Porter Stemming Algorithm.
– start-page: 1367
  year: 2004
  ident: b0145
  article-title: Determining the sentiment of opinions
  publication-title: Proceedings of the 20th international conference on computational linguistics
– reference: Mansour, R., Refaei, N., Gamon, M., Abdul-Hamid, A., & Sami, K. (2013). Revisiting the old kitchen sink: Do we need sentiment domain adaptation?
– volume: 40
  start-page: 6266
  year: 2013
  end-page: 6282
  ident: b0110
  article-title: Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network
  publication-title: Expert Systems with Applications
– start-page: 30
  year: 2011
  end-page: 38
  ident: b0015
  article-title: Sentiment analysis of twitter data
  publication-title: Proceedings of the workshop on languages in social media
– start-page: 231
  year: 2008
  end-page: 240
  ident: b0060
  article-title: A holistic lexicon-based approach to opinion mining
  publication-title: Proceedings of the 2008 international conference on web search and data mining
– start-page: 115
  year: 2014
  end-page: 123
  ident: b0185
  article-title: Automatic creation of stock market lexicons for sentiment analysis using StockTwits data
  publication-title: Proceeding of the 18th international database engineering & applications symposium
– year: 2015
  ident: b0165
  article-title: Sentiment analysis: Mining opinions, sentiments, and emotions
– reference: (pp. 1–6).
– volume: 1
  year: 2017
  ident: b0055
  article-title: Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads and What to Do about It
  publication-title: Harvard Dataverse
– volume: 4
  year: 2003
  ident: b0065
  article-title: PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
  publication-title: BMC Bioinformatics
– start-page: 2
  year: 2012
  end-page: 9
  ident: b0220
  article-title: Alleviating data sparsity for twitter sentiment analysis
  publication-title: Proceedings of the 21st ACM international World Wide Web conference
– start-page: 79
  year: 2002
  end-page: 86
  ident: b0190
  article-title: Thumbs up? Sentiment classification using machine learning techniques
  publication-title: Proceedings of the conference on empirical methods in natural language processing
– start-page: 138
  year: 2015
  end-page: 143
  ident: b0025
  article-title: TubeSpam: Comment Spam Filtering on YouTube
  publication-title: Proceedings of the 14th IEEE International Conference on Machine Learning and Applications
– start-page: 841
  year: 2004
  ident: b0095
  article-title: Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis
  publication-title: Proceedings of the twentieth international conference on computational linguistics
– start-page: 1458
  year: 2011
  ident: b0230
  article-title: Domain adaptation with ensemble of feature groups
  publication-title: Proceedings of the 22nd international joint conference on artificial intelligence
– volume: 155
  start-page: 20
  year: 2016
  end-page: 25
  ident: b0140
  article-title: A Survey on Clustering Algorithms for Partitioning Method
  publication-title: International Journal of Computer Applications
– reference: Kyriakopoulou, A. (2008). Text Classification Aided by Clustering: a Literature Review.
– start-page: 689
  year: 2008
  end-page: 697
  ident: b0070
  article-title: Online methods for multi-domain learning and adaptation
  publication-title: Proceedings of the conference on empirical methods in natural language processing
– reference: http://doi.org/10.5772/6083.
– volume: 5
  start-page: 101
  year: 2004
  end-page: 141
  ident: b0215
  article-title: In Defense of One-Vs-All Classification
  publication-title: Journal of Machine Learning Research
– start-page: 979
  year: 2007
  end-page: 982
  ident: b0250
  article-title: A novel scheme for domain-transfer problem in the context of sentiment analysis
  publication-title: Proceedings of the 16th ACM conference on information and knowledge management
– start-page: 197
  year: 2018
  end-page: 216
  ident: b0100
  article-title: A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach
  publication-title: Expert Systems with Application
– reference: .
– reference: Accessed: 2020.05.02.
– volume: 22
  start-page: 110
  year: 2006
  end-page: 125
  ident: b0135
  article-title: Sentiment classification of movie reviews using contextual valence shifters
  publication-title: Computational Intelligence
– reference: Kajanan, S., Shafeeq Bin Mohd Shariff, A., Datta, A., Dutta, K., & Paul, D. (2011). Twitter post filter for mobile applications.
– volume: 5
  start-page: 401
  year: 2018
  end-page: 408
  ident: b0085
  article-title: Youtube spam comment detection using support vector machine and K–nearest neighbor
  publication-title: Indonesian Journal of Electrical Engineering and Computer Science
– start-page: 290
  year: 2008
  end-page: 298
  ident: b0035
  article-title: When specialists and generalists work together: Domain dependence in sentiment tagging
  publication-title: Proceedings of 46th annual meeting of the association for computational linguistics
– reference: , Irvine:
– start-page: 513
  year: 2011
  end-page: 520
  ident: b0120
  article-title: Domain adaptation for large-scale sentiment classification: A deep learning approach
  publication-title: Proceedings of the 28th international conference on machine learning
– start-page: 519
  year: 2003
  end-page: 528
  ident: b0050
  article-title: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews
  publication-title: Proceedings of the 12th international conference on World Wide Web
– start-page: 1
  year: 2012
  end-page: 7
  ident: b0235
  article-title: A comparative study of feature selection and machine learning techniques for sentiment analysis
  publication-title: In Proceedings of the 2012 ACM research in applied computation symposium
– volume: 8
  year: 2008
  ident: b0020
  article-title: A Powerful Feature Selection approach based on Mutual Information
  publication-title: International Journal of Computer Science and Network Security (IJCSNS)
– reference: http://doi.org/10.1109/ICAwST.2019.8923218.
– start-page: 508
  year: 2012
  end-page: 524
  ident: b0225
  article-title: Semantic sentiment analysis of twitter
  publication-title: Proceedings of the 11th international semantic web conference
– volume: 1
  year: 2017
  ident: 10.1016/j.cie.2022.107959_b0055
  article-title: Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads and What to Do about It
  publication-title: Harvard Dataverse
– year: 2020
  ident: 10.1016/j.cie.2022.107959_b0105
  article-title: YAC2: An α-proximity based clustering algorithm
  publication-title: Expert Systems with Applications
– year: 2008
  ident: 10.1016/j.cie.2022.107959_b0170
– start-page: 1367
  year: 2004
  ident: 10.1016/j.cie.2022.107959_b0145
  article-title: Determining the sentiment of opinions
– volume: 62
  start-page: 1139
  issue: 3
  year: 2007
  ident: 10.1016/j.cie.2022.107959_b0255
  article-title: Giving content to investor sentiment: The role of media in the stock market
  publication-title: The Journal of Finance
  doi: 10.1111/j.1540-6261.2007.01232.x
– start-page: 689
  year: 2008
  ident: 10.1016/j.cie.2022.107959_b0070
  article-title: Online methods for multi-domain learning and adaptation
– start-page: 115
  year: 2014
  ident: 10.1016/j.cie.2022.107959_b0185
  article-title: Automatic creation of stock market lexicons for sentiment analysis using StockTwits data
– start-page: 841
  year: 2004
  ident: 10.1016/j.cie.2022.107959_b0095
  article-title: Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis
– ident: 10.1016/j.cie.2022.107959_b0010
– ident: 10.1016/j.cie.2022.107959_b0160
  doi: 10.5772/6083
– volume: 32
  start-page: 31
  issue: 6
  year: 2015
  ident: 10.1016/j.cie.2022.107959_b0150
  article-title: Playing with Duality: An overview of recent primal-dual approaches for solving large-scale optimization problems
  publication-title: IEEE Signal Processing Magazine
  doi: 10.1109/MSP.2014.2377273
– volume: 3
  start-page: 2349
  issue: 10
  year: 2017
  ident: 10.1016/j.cie.2022.107959_b0195
  article-title: Literature Survey on Sentiment Analysis of Twitter Data using Machine Learning Approaches
  publication-title: International Journal for Innovation Research in Science & Technology (IJIRST)
– year: 2020
  ident: 10.1016/j.cie.2022.107959_b0075
  article-title: Sparse Principal Component Analysis for Natural Language Processing
  publication-title: Annals of Data Science
– volume: 37
  start-page: 267
  issue: 2
  year: 2011
  ident: 10.1016/j.cie.2022.107959_b0245
  article-title: Lexicon-based methods for sentiment analysis
  publication-title: Computational Linguistics
  doi: 10.1162/COLI_a_00049
– volume: 2
  start-page: 267
  issue: 3
  year: 2014
  ident: 10.1016/j.cie.2022.107959_b0080
  article-title: A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis
  publication-title: IEEE Transactions on Emerging Topics in Computing
  doi: 10.1109/TETC.2014.2330519
– volume: 5
  start-page: 15650
  year: 2017
  ident: 10.1016/j.cie.2022.107959_b0005
  article-title: A Review on Mobile SMS Spam Filtering Techniques
  publication-title: IEEE
– start-page: 2
  year: 2012
  ident: 10.1016/j.cie.2022.107959_b0220
  article-title: Alleviating data sparsity for twitter sentiment analysis
– start-page: 79
  year: 2002
  ident: 10.1016/j.cie.2022.107959_b0190
  article-title: Thumbs up? Sentiment classification using machine learning techniques
– volume: 108
  start-page: 42
  year: 2016
  ident: 10.1016/j.cie.2022.107959_b0210
  article-title: Aspect extraction for opinion mining with a deep convolutional neural network
  publication-title: Knowledge Based Systems
  doi: 10.1016/j.knosys.2016.06.009
– start-page: 290
  year: 2008
  ident: 10.1016/j.cie.2022.107959_b0035
  article-title: When specialists and generalists work together: Domain dependence in sentiment tagging
– year: 2002
  ident: 10.1016/j.cie.2022.107959_b0265
  article-title: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews
– start-page: 151
  year: 2011
  ident: 10.1016/j.cie.2022.107959_b0125
  article-title: Target-dependent twitter sentiment classification
– volume: 56
  start-page: 82
  issue: 4
  year: 2013
  ident: 10.1016/j.cie.2022.107959_b0090
  article-title: Techniques and applications for sentiment analysis
  publication-title: Communications of the ACM
  doi: 10.1145/2436256.2436274
– year: 2015
  ident: 10.1016/j.cie.2022.107959_b0165
– volume: 5
  start-page: 401
  issue: 3
  year: 2018
  ident: 10.1016/j.cie.2022.107959_b0085
  article-title: Youtube spam comment detection using support vector machine and K–nearest neighbor
  publication-title: Indonesian Journal of Electrical Engineering and Computer Science
– year: 2018
  ident: 10.1016/j.cie.2022.107959_b0240
  article-title: A Hybrid Sampling Method Based on Safe Screening for Imbalanced Datasets with Sparse Structure
  publication-title: IEEE
– start-page: 1
  year: 2006
  ident: 10.1016/j.cie.2022.107959_b0200
  article-title: Contextual valence shifters
– volume: 4
  issue: 11
  year: 2003
  ident: 10.1016/j.cie.2022.107959_b0065
  article-title: PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine
  publication-title: BMC Bioinformatics
– volume: 40
  start-page: 4065
  issue: 10
  year: 2013
  ident: 10.1016/j.cie.2022.107959_b0155
  article-title: Ontology-based sentiment analysis of twitter posts
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2013.01.001
– start-page: 1458
  year: 2011
  ident: 10.1016/j.cie.2022.107959_b0230
  article-title: Domain adaptation with ensemble of feature groups
– start-page: 583
  year: 2018
  ident: 10.1016/j.cie.2022.107959_b0040
  article-title: E-Mail Spam Filtering: A Review of Techniques and Trends
– ident: 10.1016/j.cie.2022.107959_b0115
  doi: 10.1109/ICAwST.2019.8923218
– volume: 5
  start-page: 101
  year: 2004
  ident: 10.1016/j.cie.2022.107959_b0215
  article-title: In Defense of One-Vs-All Classification
  publication-title: Journal of Machine Learning Research
– start-page: 1
  year: 2012
  ident: 10.1016/j.cie.2022.107959_b0235
  article-title: A comparative study of feature selection and machine learning techniques for sentiment analysis
– start-page: 440
  year: 2007
  ident: 10.1016/j.cie.2022.107959_b0045
  article-title: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification
– ident: 10.1016/j.cie.2022.107959_b0205
– year: 2005
  ident: 10.1016/j.cie.2022.107959_b9000
  article-title: Customizing sentiment classifiers to new domains: A case study
– start-page: 138
  year: 2015
  ident: 10.1016/j.cie.2022.107959_b0025
  article-title: TubeSpam: Comment Spam Filtering on YouTube
– start-page: 231
  year: 2008
  ident: 10.1016/j.cie.2022.107959_b0060
  article-title: A holistic lexicon-based approach to opinion mining
– ident: 10.1016/j.cie.2022.107959_b0130
– volume: 40
  start-page: 6266
  issue: 16
  year: 2013
  ident: 10.1016/j.cie.2022.107959_b0110
  article-title: Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network
  publication-title: Expert Systems with Applications
  doi: 10.1016/j.eswa.2013.05.057
– volume: 155
  start-page: 20
  issue: 4
  year: 2016
  ident: 10.1016/j.cie.2022.107959_b0140
  article-title: A Survey on Clustering Algorithms for Partitioning Method
  publication-title: International Journal of Computer Applications
  doi: 10.5120/ijca2016912291
– start-page: 30
  year: 2011
  ident: 10.1016/j.cie.2022.107959_b0015
  article-title: Sentiment analysis of twitter data
– ident: 10.1016/j.cie.2022.107959_b0175
– start-page: 197
  year: 2018
  ident: 10.1016/j.cie.2022.107959_b0100
  article-title: A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach
  publication-title: Expert Systems with Application
  doi: 10.1016/j.eswa.2018.04.006
– volume: 148–152
  year: 2015
  ident: 10.1016/j.cie.2022.107959_b0180
  article-title: A New Hierarchical Clustering Algorithm
  publication-title: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS)
– start-page: 508
  year: 2012
  ident: 10.1016/j.cie.2022.107959_b0225
  article-title: Semantic sentiment analysis of twitter
– start-page: 979
  year: 2007
  ident: 10.1016/j.cie.2022.107959_b0250
  article-title: A novel scheme for domain-transfer problem in the context of sentiment analysis
– start-page: 519
  year: 2003
  ident: 10.1016/j.cie.2022.107959_b0050
  article-title: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews
– ident: 10.1016/j.cie.2022.107959_b0030
– start-page: 513
  year: 2011
  ident: 10.1016/j.cie.2022.107959_b0120
  article-title: Domain adaptation for large-scale sentiment classification: A deep learning approach
– volume: 22
  start-page: 110
  issue: 2
  year: 2006
  ident: 10.1016/j.cie.2022.107959_b0135
  article-title: Sentiment classification of movie reviews using contextual valence shifters
  publication-title: Computational Intelligence
  doi: 10.1111/j.1467-8640.2006.00277.x
– volume: 9
  start-page: 1
  issue: 3
  year: 2016
  ident: 10.1016/j.cie.2022.107959_b0260
  article-title: A Survey on Clustering Techniques for Big Data Mining
  publication-title: Indian Journal of Science and Technology
– volume: 8
  issue: 4
  year: 2008
  ident: 10.1016/j.cie.2022.107959_b0020
  article-title: A Powerful Feature Selection approach based on Mutual Information
  publication-title: International Journal of Computer Science and Network Security (IJCSNS)
SSID ssj0004591
Score 2.4107392
Snippet •Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 107959
SubjectTerms Clustering Analysis
Machine learning
Sentiment analysis
Spam filtering
Transferability
YAC2
Title Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability
URI https://dx.doi.org/10.1016/j.cie.2022.107959
Volume 165
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  customDbUrl:
  eissn: 1879-0550
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004591
  issn: 0360-8352
  databaseCode: GBLVA
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Science Direct
  customDbUrl:
  eissn: 1879-0550
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004591
  issn: 0360-8352
  databaseCode: .~1
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect (Elsevier)
  customDbUrl:
  eissn: 1879-0550
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004591
  issn: 0360-8352
  databaseCode: ACRLP
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: ScienceDirect Freedom Collection Journals
  customDbUrl:
  eissn: 1879-0550
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004591
  issn: 0360-8352
  databaseCode: AIKHN
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1879-0550
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0004591
  issn: 0360-8352
  databaseCode: AKRWK
  dateStart: 19770101
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELYqWGDgUUCUR-WBCSk0dZyHx6qiKiC6lEplIbIduxT1pTYdWPjtnGMHFQkY2GLnHEUX5-6zffcdQlfc10HGMumBL6QeZUR4goMxjLUScUhiHQuTKPzYi7oDej8MhxXULnNhTFils_3WphfW2vU0nDYbi_G40Qfba_GDyTYhzCT8UhqbKgY3H80NxnBbNQ-EPSNdnmwWMV7wWFgiEgJtU3P7Z9-04W86B2jPAUXcsu9yiCpqVkX7DjRi90uuqmh3g1HwCL30TfSP2fHD3NGNwEWGwW5MsR6bo3GQwybafYQB--HnVptgOVmv3B0-Gc2X4_x1is0OLc4LXKuWls37_RgNOrdP7a7nSih4kpA49xTlfqI4VSokQkvAO-CjtRSBTiLlR6wpAp9kvq81rCM5ZxFvRoLJLNJxlAWcBCdoazafqVOEE8op00EkwK3ShAcsEyKJs0AbAh8ZJjXkl8pLpeMXN2UuJmkZSPYG_So1-k6tvmvo-mvIwpJr_CVMyy-SfpshKRj_34ed_W_YOdoxLRttdoG28uVaXQL8yEW9mF91tN26e-j2PgEDddp5
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED6VMgADb0R5emBCCk0dN4-xqqgKtF3aSmUhshO7BPWlNh1Y-O2cGwcVCRjYEvtsWRfn7rN9_g7ghtvKiYM4stAXMosFVFiCozH0lBRelXrKE_qicLvjNvvscVAdFKCe34XRYZXG9mc2fWWtTUnZaLM8S5JyF21vhh_0bRMa-BuwybB3vQK7-6isUYZnafNQ2tLi-dHmKsgL-8U1IqX4rpNu_-yc1hxOYx92DVIktWwwB1CQk0PYM6iRmH9ycQg7a5SCR_DS1eE_esuPcMM3gg8xQcMxJirRZ-MoR3S4-5Ag-CPPtTol0Wi5MDV8NJzOk_R1TPQWLUlXwFbOMzrv92PoN-579aZlcihYEaVeaknGbV9yJmWVChUh4EEnrSLhKN-VthtUhGPT2LaVwoUk54HLK64IothVnhs7nDonUJxMJ_IUiM84C5TjCvSrzOdOEAvhe7GjNINPVPVLYOfKCyNDMK7zXIzCPJLsDctlqPUdZvouwe1Xk1nGrvGXMMu_SPhtioRo_X9vdva_Ztew1ey1W2HrofN0Dtu6Jgs9u4BiOl_KS8QiqbhazbVPMzXcDg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sentiment+analysis+and+spam+filtering+using+the+YAC2+clustering+algorithm+with+transferability&rft.jtitle=Computers+%26+industrial+engineering&rft.au=Ghiassi%2C+M.&rft.au=Lee%2C+Sean&rft.au=Gaikwad%2C+Swati+Ramesh&rft.date=2022-03-01&rft.issn=0360-8352&rft.volume=165&rft.spage=107959&rft_id=info:doi/10.1016%2Fj.cie.2022.107959&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_cie_2022_107959
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0360-8352&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0360-8352&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0360-8352&client=summon