Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability

•Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering alternatives.•A domain transferable feature engineering approach is developed for diverse datasets.•Intelligent feature engineering can improve pe...

Full description

Saved in:

Bibliographic Details
Published in	Computers & industrial engineering Vol. 165; p. 107959
Main Authors	Ghiassi, M., Lee, Sean, Gaikwad, Swati Ramesh
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.03.2022
Subjects	Clustering Analysis Machine learning Sentiment analysis Spam filtering Transferability YAC2 Sentiment analysis Spam filtering Transferability Clustering Analysis YAC2 Machine learning
Online Access	Get full text
ISSN	0360-8352 1879-0550
DOI	10.1016/j.cie.2022.107959

Cover

Abstract	•Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering alternatives.•A domain transferable feature engineering approach is developed for diverse datasets.•Intelligent feature engineering can improve performance regardless of tools used. Two notable applications of text classification are sentiment analysis and spam filtering. Traditional machine learning approaches to text classification are often complex, non-transferrable, and require supervision. This paper introduces an unsupervised approach to text classification which is relatively simple and transfers between problem domains, while providing accuracy comparable or better than established alternatives. We present an integrated solution which combines a new clustering algorithm, Yet Another Clustering Algorithm (YAC2), with a domain transferrable feature engineering approach for Twitter sentiment analysis and spam filtering of YouTube comments. We evaluate the effectiveness of this integrated solution for Twitter sentiment analysis using three datasets: Starbucks, Verizon, and Southwest Airlines. YouTube spam filtering is evaluated using four datasets: Psy, LMFAO, Shakira,and Katy Perry. We compare the results with established clusteringsolutions: KNN, Spectral, and DBSCAN. Our integrated solution performs better than all the alternatives for sentiment analysis. For spam filtering, YAC2 and KNN perform within 1% of each other and far better than Spectral and DBSCAN for all datasets. Additionally, our feature engineering approach improves accuracy compared to using a traditional method, while significantly reducing model dimensionality, matrix sparsity and providing transferability across the datasets tested.
AbstractList	•Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering alternatives.•A domain transferable feature engineering approach is developed for diverse datasets.•Intelligent feature engineering can improve performance regardless of tools used. Two notable applications of text classification are sentiment analysis and spam filtering. Traditional machine learning approaches to text classification are often complex, non-transferrable, and require supervision. This paper introduces an unsupervised approach to text classification which is relatively simple and transfers between problem domains, while providing accuracy comparable or better than established alternatives. We present an integrated solution which combines a new clustering algorithm, Yet Another Clustering Algorithm (YAC2), with a domain transferrable feature engineering approach for Twitter sentiment analysis and spam filtering of YouTube comments. We evaluate the effectiveness of this integrated solution for Twitter sentiment analysis using three datasets: Starbucks, Verizon, and Southwest Airlines. YouTube spam filtering is evaluated using four datasets: Psy, LMFAO, Shakira,and Katy Perry. We compare the results with established clusteringsolutions: KNN, Spectral, and DBSCAN. Our integrated solution performs better than all the alternatives for sentiment analysis. For spam filtering, YAC2 and KNN perform within 1% of each other and far better than Spectral and DBSCAN for all datasets. Additionally, our feature engineering approach improves accuracy compared to using a traditional method, while significantly reducing model dimensionality, matrix sparsity and providing transferability across the datasets tested.
ArticleNumber	107959
Author	Ghiassi, M. Gaikwad, Swati Ramesh Lee, Sean
Author_xml	– sequence: 1 givenname: M. orcidid: 0000-0002-5748-7513 surname: Ghiassi fullname: Ghiassi, M. email: mghiassi@scu.edu organization: Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, United States – sequence: 2 givenname: Sean orcidid: 0000-0002-1810-3468 surname: Lee fullname: Lee, Sean email: sean@ciitizen.com organization: Ciitizen Corp., 3000 El Camino Real, 3 Palo Alto Square, Palo Alto, CA 94306, United States – sequence: 3 givenname: Swati Ramesh surname: Gaikwad fullname: Gaikwad, Swati Ramesh email: swat.gkd@gmail.com organization: Santa Clara University, 500 El Camino Real, Santa Clara, CA 95053, United States
BookMark	eNp9kM9OwzAMxiMEEtvgAbjlBTqctE1bcZom_kmTOAAHLkRp6myZunRKMtDenkzbicMun2M7P8v-xuTSDQ4JuWMwZcDE_XqqLU45cJ7yqimbCzJiddVkUJZwSUaQC8jqvOTXZBzCGgCKsmEj8v2OLtpNEqqc6vfBhvToaNiqDTW2j-itW9JdOGhcIf2azTnV_S6cOqpfDt7G1Yb-JqXRKxcMetXa3sb9Dbkyqg94e4oT8vn0-DF_yRZvz6_z2SLTnFcxw0JBjapALHlrNC8BeG50m5taIIiGtTnwDsCYArhSjVBMtI3uhKlElyueT0h1nKv9EIJHI7WNKtrBpYVsLxnIg01yneooDzbJo02JZP_Irbcb5fdnmYcjg-mkH4tehvTFaeysRx1lN9gz9B_1G4QT
CitedBy_id	crossref_primary_10_1002_spy2_402 crossref_primary_10_1016_j_desal_2023_116482 crossref_primary_10_1155_2022_7183207 crossref_primary_10_3390_electronics13071346 crossref_primary_10_1016_j_dajour_2023_100390 crossref_primary_10_1016_j_cie_2024_110142 crossref_primary_10_3390_electronics13112034 crossref_primary_10_1016_j_cie_2023_109693
Cites_doi	10.1111/j.1540-6261.2007.01232.x 10.5772/6083 10.1109/MSP.2014.2377273 10.1162/COLI_a_00049 10.1109/TETC.2014.2330519 10.1016/j.knosys.2016.06.009 10.1145/2436256.2436274 10.1016/j.eswa.2013.01.001 10.1109/ICAwST.2019.8923218 10.1016/j.eswa.2013.05.057 10.5120/ijca2016912291 10.1016/j.eswa.2018.04.006 10.1111/j.1467-8640.2006.00277.x
ContentType	Journal Article
Copyright	2022 Elsevier Ltd
Copyright_xml	– notice: 2022 Elsevier Ltd
DBID	AAYXX CITATION
DOI	10.1016/j.cie.2022.107959
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Engineering
EISSN	1879-0550
ExternalDocumentID	10_1016_j_cie_2022_107959 S0360835222000298
GroupedDBID	--K --M -~X .DC .~1 0R~ 1B1 1RT 1~. 1~5 29F 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN 9JO AAAKG AABNK AACTN AAEDT AAEDW AAFWJ AAIAV AAIKC AAIKJ AAKOC AALRI AAMNW AAOAW AAQFI AAQXK AARIN AAXUO ABAOU ABMAC ABUCO ABXDB ABYKQ ACAZW ACDAQ ACGFO ACGFS ACNCT ACNNM ACRLP ADBBV ADEZE ADGUI ADMUD ADRHT ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AIEXJ AIGVJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ APLSM ARUGR ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BKOMP BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA HAMUX HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM LX9 LY1 LY7 M41 MHUIS MO0 MS~ N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. PQQKQ Q38 R2- RIG RNS ROL RPZ RXW SBC SDF SDG SDP SDS SES SET SEW SPC SPCBC SSB SSD SST SSW SSZ T5K TAE TN5 WUQ XPP ZMT ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABJNI ABWVN ACLOT ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AGQPQ AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP CITATION EFKBS ~HD
ID	FETCH-LOGICAL-c227t-e4a08ea4ee52bfc250023fcb3f86e0691b302d00ff402aa96a16b9cd6f76d3a23
IEDL.DBID	.~1
ISSN	0360-8352
IngestDate	Thu Apr 24 22:57:14 EDT 2025 Thu Oct 09 00:36:28 EDT 2025 Fri Feb 23 02:40:49 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Sentiment analysis Spam filtering Transferability Clustering Analysis YAC2 Machine learning
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c227t-e4a08ea4ee52bfc250023fcb3f86e0691b302d00ff402aa96a16b9cd6f76d3a23
ORCID	0000-0002-5748-7513 0000-0002-1810-3468
ParticipantIDs	crossref_citationtrail_10_1016_j_cie_2022_107959 crossref_primary_10_1016_j_cie_2022_107959 elsevier_sciencedirect_doi_10_1016_j_cie_2022_107959
PublicationCentury	2000
PublicationDate	March 2022 2022-03-00
PublicationDateYYYYMMDD	2022-03-01
PublicationDate_xml	– month: 03 year: 2022 text: March 2022
PublicationDecade	2020
PublicationTitle	Computers & industrial engineering
PublicationYear	2022
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	Khanali, Vaziri (b0140) 2016; 155 Alberto, T., & Almeida, T. (2015). UCI Machine Learning Repository. Taboada, Brook, Tofiloski, Voll, Stede (b0245) 2011; 37 Kennedy, Inkpen (b0135) 2006; 22 Ghiassi, Saidane, Oswal (b0105) 2020 Komodakis, Pesquet (b0150) 2015; 32 Patel, Butani, Patel, Sawant (b0195) 2017; 3 Kim, Hovy (b0145) 2004 Polanyi, Zaenen (b0200) 2006 Ghiassi, Skinner, Zimbra (b0110) 2013; 40 Nazari, Kang, Asharif, Sung, Ogawa (b0180) 2015; 148–152 Accessed: 2020.05.02. Glorot, Bordes, Bengio (b0120) 2011 Oliveira, Cortez, Areal (b0185) 2014 Aue, Gamon (b9000) 2005 Feldman (b0090) 2013; 56 Bhowmick, Hazarika (b0040) 2018 Shi, Gao, Liu (b0240) 2018 Ghiassi, Lee (b0100) 2018 Sharma, Dey (b0235) 2012 Agarwal, Xie, Vovsha, Rambow, Passonneau (b0015) 2011 http://doi.org/10.1109/ICAwST.2019.8923218. http://doi.org/10.5772/6083. . Drikvandi, Lawal (b0075) 2020 Tiruveedhula, Rani, Narayana (b0260) 2016; 9 Tetlock (b0255) 2007; 62 Donaldson, Martin, de Bruijn (b0065) 2003; 4 Manning, Raghavan, Schütze (b0170) 2008 Saif, He, Alani (b0225) 2012 Rifkin, Klautau (b0215) 2004; 5 Andreevskaia, Bergler (b0035) 2008 Abdulhamid, Latiff, Chiroma, Osho, Abdul-Salaam, Abubakar (b0005) 2017; 5 Kajanan, S., Shafeeq Bin Mohd Shariff, A., Datta, A., Dutta, K., & Paul, D. (2011). Twitter post filter for mobile applications. Fahad, Alshatri, Tari, Alamri, Khalil, Zomaya (b0080) 2014; 2 Kontopoulos, Berberidis, Dergiades, Bassiliades (b0155) 2013; 40 Kyriakopoulou, A. (2008). Text Classification Aided by Clustering: a Literature Review. Dredze, Crammer (b0070) 2008 Tan, Wu, Tang, Cheng (b0250) 2007 (pp. 1–6). Blitzer, Dredze, Pereira (b0045) 2007 Dave, Lawrence, Pennock (b0050) 2003 Jiang, Yu, Zhou, Liu, Zhao (b0125) 2011 Turney (b0265) 2002 Porter, M. (2006). The Porter Stemming Algorithm. Akadi, Ouarighi, Aboutajdine (b0020) 2008; 8 Mansour, R., Refaei, N., Gamon, M., Abdul-Hamid, A., & Sami, K. (2013). Revisiting the old kitchen sink: Do we need sentiment domain adaptation? Samdani, Yih (b0230) 2011 Saif, He, Alani (b0220) 2012 Ghosh, K., Banerjee, A., Chatterjee, S., Sen, S. (2019). Imbalanced Twitter Sentiment Analysis using Minority Oversampling Poria, Cambria, Gelbukh (b0210) 2016; 108 Liu (b0165) 2015 Ding, Liu, Yu (b0060) 2008 Alberto, Lochter, Almeida (b0025) 2015 Pang, Lee, Vaithyanathan (b0190) 2002 Denny, Spirling (b0055) 2017; 1 Gamon (b0095) 2004 Abbasi, A., Hassan, A., and Dhar, M. (2014). Benchmarking Twitter sentiment analysis tools. (pp. 420–427). Irvine Foozy, Shamala, Suradi (b0085) 2018; 5 Saif (10.1016/j.cie.2022.107959_b0220) 2012 10.1016/j.cie.2022.107959_b0205 Fahad (10.1016/j.cie.2022.107959_b0080) 2014; 2 Khanali (10.1016/j.cie.2022.107959_b0140) 2016; 155 Alberto (10.1016/j.cie.2022.107959_b0025) 2015 Andreevskaia (10.1016/j.cie.2022.107959_b0035) 2008 Ghiassi (10.1016/j.cie.2022.107959_b0100) 2018 Feldman (10.1016/j.cie.2022.107959_b0090) 2013; 56 10.1016/j.cie.2022.107959_b0160 Glorot (10.1016/j.cie.2022.107959_b0120) 2011 Taboada (10.1016/j.cie.2022.107959_b0245) 2011; 37 Gamon (10.1016/j.cie.2022.107959_b0095) 2004 Tan (10.1016/j.cie.2022.107959_b0250) 2007 Polanyi (10.1016/j.cie.2022.107959_b0200) 2006 Ding (10.1016/j.cie.2022.107959_b0060) 2008 Shi (10.1016/j.cie.2022.107959_b0240) 2018 Agarwal (10.1016/j.cie.2022.107959_b0015) 2011 Manning (10.1016/j.cie.2022.107959_b0170) 2008 Tetlock (10.1016/j.cie.2022.107959_b0255) 2007; 62 Poria (10.1016/j.cie.2022.107959_b0210) 2016; 108 Patel (10.1016/j.cie.2022.107959_b0195) 2017; 3 Abdulhamid (10.1016/j.cie.2022.107959_b0005) 2017; 5 10.1016/j.cie.2022.107959_b0030 Tiruveedhula (10.1016/j.cie.2022.107959_b0260) 2016; 9 10.1016/j.cie.2022.107959_b0115 Denny (10.1016/j.cie.2022.107959_b0055) 2017; 1 Oliveira (10.1016/j.cie.2022.107959_b0185) 2014 Donaldson (10.1016/j.cie.2022.107959_b0065) 2003; 4 Pang (10.1016/j.cie.2022.107959_b0190) 2002 Ghiassi (10.1016/j.cie.2022.107959_b0110) 2013; 40 Komodakis (10.1016/j.cie.2022.107959_b0150) 2015; 32 Jiang (10.1016/j.cie.2022.107959_b0125) 2011 Liu (10.1016/j.cie.2022.107959_b0165) 2015 Turney (10.1016/j.cie.2022.107959_b0265) 2002 Dredze (10.1016/j.cie.2022.107959_b0070) 2008 Akadi (10.1016/j.cie.2022.107959_b0020) 2008; 8 Aue (10.1016/j.cie.2022.107959_b9000) 2005 Saif (10.1016/j.cie.2022.107959_b0225) 2012 Drikvandi (10.1016/j.cie.2022.107959_b0075) 2020 Rifkin (10.1016/j.cie.2022.107959_b0215) 2004; 5 Foozy (10.1016/j.cie.2022.107959_b0085) 2018; 5 Kim (10.1016/j.cie.2022.107959_b0145) 2004 Kennedy (10.1016/j.cie.2022.107959_b0135) 2006; 22 Bhowmick (10.1016/j.cie.2022.107959_b0040) 2018 Sharma (10.1016/j.cie.2022.107959_b0235) 2012 Samdani (10.1016/j.cie.2022.107959_b0230) 2011 Kontopoulos (10.1016/j.cie.2022.107959_b0155) 2013; 40 10.1016/j.cie.2022.107959_b0130 Ghiassi (10.1016/j.cie.2022.107959_b0105) 2020 Blitzer (10.1016/j.cie.2022.107959_b0045) 2007 Dave (10.1016/j.cie.2022.107959_b0050) 2003 Nazari (10.1016/j.cie.2022.107959_b0180) 2015; 148–152 10.1016/j.cie.2022.107959_b0010 10.1016/j.cie.2022.107959_b0175
References_xml	– reference: (pp. 420–427). – year: 2020 ident: b0105 article-title: YAC2: An α-proximity based clustering algorithm publication-title: Expert Systems with Applications – start-page: 1 year: 2006 end-page: 10 ident: b0200 article-title: Contextual valence shifters publication-title: Computing attitude and affect in text: Theory and applications – volume: 148–152 year: 2015 ident: b0180 article-title: A New Hierarchical Clustering Algorithm publication-title: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) – year: 2018 ident: b0240 article-title: A Hybrid Sampling Method Based on Safe Screening for Imbalanced Datasets with Sparse Structure publication-title: IEEE – volume: 3 start-page: 2349 year: 2017 end-page: 6010 ident: b0195 article-title: Literature Survey on Sentiment Analysis of Twitter Data using Machine Learning Approaches publication-title: International Journal for Innovation Research in Science & Technology (IJIRST) – reference: Ghosh, K., Banerjee, A., Chatterjee, S., Sen, S. (2019). Imbalanced Twitter Sentiment Analysis using Minority Oversampling, – volume: 62 start-page: 1139 year: 2007 end-page: 1168 ident: b0255 article-title: Giving content to investor sentiment: The role of media in the stock market publication-title: The Journal of Finance – volume: 9 start-page: 1 year: 2016 end-page: 12 ident: b0260 article-title: A Survey on Clustering Techniques for Big Data Mining publication-title: Indian Journal of Science and Technology – year: 2002 ident: b0265 article-title: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews publication-title: Proceedings of the 40th annual meeting of the – reference: Alberto, T., & Almeida, T. (2015). UCI Machine Learning Repository. – volume: 56 start-page: 82 year: 2013 end-page: 89 ident: b0090 article-title: Techniques and applications for sentiment analysis publication-title: Communications of the ACM – start-page: 151 year: 2011 end-page: 160 ident: b0125 article-title: Target-dependent twitter sentiment classification publication-title: Proceedings of the 49th annual meeting of the association for computational linguistics – reference: Abbasi, A., Hassan, A., and Dhar, M. (2014). Benchmarking Twitter sentiment analysis tools. – year: 2005 ident: b9000 article-title: Customizing sentiment classifiers to new domains: A case study publication-title: In Proceedings of recent advances in natural language processing – volume: 108 start-page: 42 year: 2016 end-page: 49 ident: b0210 article-title: Aspect extraction for opinion mining with a deep convolutional neural network publication-title: Knowledge Based Systems – volume: 32 start-page: 31 year: 2015 end-page: 54 ident: b0150 article-title: Playing with Duality: An overview of recent primal-dual approaches for solving large-scale optimization problems publication-title: IEEE Signal Processing Magazine – year: 2020 ident: b0075 article-title: Sparse Principal Component Analysis for Natural Language Processing publication-title: Annals of Data Science – volume: 37 start-page: 267 year: 2011 end-page: 307 ident: b0245 article-title: Lexicon-based methods for sentiment analysis publication-title: Computational Linguistics – volume: 5 start-page: 15650 year: 2017 end-page: 15666 ident: b0005 article-title: A Review on Mobile SMS Spam Filtering Techniques publication-title: IEEE – start-page: 583 year: 2018 end-page: 590 ident: b0040 article-title: E-Mail Spam Filtering: A Review of Techniques and Trends publication-title: book: Advances in Electronics, Communication and Computing, (443) – year: 2008 ident: b0170 article-title: Introduction to Information Retrieval – volume: 2 start-page: 267 year: 2014 end-page: 279 ident: b0080 article-title: A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis publication-title: IEEE Transactions on Emerging Topics in Computing – volume: 40 start-page: 4065 year: 2013 end-page: 4074 ident: b0155 article-title: Ontology-based sentiment analysis of twitter posts publication-title: Expert Systems with Applications – start-page: 440 year: 2007 end-page: 447 ident: b0045 article-title: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification publication-title: Proceedings of the annual meetings of the Association of Computational Linguistics (ACL) – reference: Porter, M. (2006). The Porter Stemming Algorithm. – start-page: 1367 year: 2004 ident: b0145 article-title: Determining the sentiment of opinions publication-title: Proceedings of the 20th international conference on computational linguistics – reference: Mansour, R., Refaei, N., Gamon, M., Abdul-Hamid, A., & Sami, K. (2013). Revisiting the old kitchen sink: Do we need sentiment domain adaptation? – volume: 40 start-page: 6266 year: 2013 end-page: 6282 ident: b0110 article-title: Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network publication-title: Expert Systems with Applications – start-page: 30 year: 2011 end-page: 38 ident: b0015 article-title: Sentiment analysis of twitter data publication-title: Proceedings of the workshop on languages in social media – start-page: 231 year: 2008 end-page: 240 ident: b0060 article-title: A holistic lexicon-based approach to opinion mining publication-title: Proceedings of the 2008 international conference on web search and data mining – start-page: 115 year: 2014 end-page: 123 ident: b0185 article-title: Automatic creation of stock market lexicons for sentiment analysis using StockTwits data publication-title: Proceeding of the 18th international database engineering & applications symposium – year: 2015 ident: b0165 article-title: Sentiment analysis: Mining opinions, sentiments, and emotions – reference: (pp. 1–6). – volume: 1 year: 2017 ident: b0055 article-title: Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads and What to Do about It publication-title: Harvard Dataverse – volume: 4 year: 2003 ident: b0065 article-title: PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine publication-title: BMC Bioinformatics – start-page: 2 year: 2012 end-page: 9 ident: b0220 article-title: Alleviating data sparsity for twitter sentiment analysis publication-title: Proceedings of the 21st ACM international World Wide Web conference – start-page: 79 year: 2002 end-page: 86 ident: b0190 article-title: Thumbs up? Sentiment classification using machine learning techniques publication-title: Proceedings of the conference on empirical methods in natural language processing – start-page: 138 year: 2015 end-page: 143 ident: b0025 article-title: TubeSpam: Comment Spam Filtering on YouTube publication-title: Proceedings of the 14th IEEE International Conference on Machine Learning and Applications – start-page: 841 year: 2004 ident: b0095 article-title: Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis publication-title: Proceedings of the twentieth international conference on computational linguistics – start-page: 1458 year: 2011 ident: b0230 article-title: Domain adaptation with ensemble of feature groups publication-title: Proceedings of the 22nd international joint conference on artificial intelligence – volume: 155 start-page: 20 year: 2016 end-page: 25 ident: b0140 article-title: A Survey on Clustering Algorithms for Partitioning Method publication-title: International Journal of Computer Applications – reference: Kyriakopoulou, A. (2008). Text Classification Aided by Clustering: a Literature Review. – start-page: 689 year: 2008 end-page: 697 ident: b0070 article-title: Online methods for multi-domain learning and adaptation publication-title: Proceedings of the conference on empirical methods in natural language processing – reference: http://doi.org/10.5772/6083. – volume: 5 start-page: 101 year: 2004 end-page: 141 ident: b0215 article-title: In Defense of One-Vs-All Classification publication-title: Journal of Machine Learning Research – start-page: 979 year: 2007 end-page: 982 ident: b0250 article-title: A novel scheme for domain-transfer problem in the context of sentiment analysis publication-title: Proceedings of the 16th ACM conference on information and knowledge management – start-page: 197 year: 2018 end-page: 216 ident: b0100 article-title: A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach publication-title: Expert Systems with Application – reference: . – reference: Accessed: 2020.05.02. – volume: 22 start-page: 110 year: 2006 end-page: 125 ident: b0135 article-title: Sentiment classification of movie reviews using contextual valence shifters publication-title: Computational Intelligence – reference: Kajanan, S., Shafeeq Bin Mohd Shariff, A., Datta, A., Dutta, K., & Paul, D. (2011). Twitter post filter for mobile applications. – volume: 5 start-page: 401 year: 2018 end-page: 408 ident: b0085 article-title: Youtube spam comment detection using support vector machine and K–nearest neighbor publication-title: Indonesian Journal of Electrical Engineering and Computer Science – start-page: 290 year: 2008 end-page: 298 ident: b0035 article-title: When specialists and generalists work together: Domain dependence in sentiment tagging publication-title: Proceedings of 46th annual meeting of the association for computational linguistics – reference: , Irvine: – start-page: 513 year: 2011 end-page: 520 ident: b0120 article-title: Domain adaptation for large-scale sentiment classification: A deep learning approach publication-title: Proceedings of the 28th international conference on machine learning – start-page: 519 year: 2003 end-page: 528 ident: b0050 article-title: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews publication-title: Proceedings of the 12th international conference on World Wide Web – start-page: 1 year: 2012 end-page: 7 ident: b0235 article-title: A comparative study of feature selection and machine learning techniques for sentiment analysis publication-title: In Proceedings of the 2012 ACM research in applied computation symposium – volume: 8 year: 2008 ident: b0020 article-title: A Powerful Feature Selection approach based on Mutual Information publication-title: International Journal of Computer Science and Network Security (IJCSNS) – reference: http://doi.org/10.1109/ICAwST.2019.8923218. – start-page: 508 year: 2012 end-page: 524 ident: b0225 article-title: Semantic sentiment analysis of twitter publication-title: Proceedings of the 11th international semantic web conference – volume: 1 year: 2017 ident: 10.1016/j.cie.2022.107959_b0055 article-title: Text Preprocessing for Unsupervised Learning: Why It Matters, When It Misleads and What to Do about It publication-title: Harvard Dataverse – year: 2020 ident: 10.1016/j.cie.2022.107959_b0105 article-title: YAC2: An α-proximity based clustering algorithm publication-title: Expert Systems with Applications – year: 2008 ident: 10.1016/j.cie.2022.107959_b0170 – start-page: 1367 year: 2004 ident: 10.1016/j.cie.2022.107959_b0145 article-title: Determining the sentiment of opinions – volume: 62 start-page: 1139 issue: 3 year: 2007 ident: 10.1016/j.cie.2022.107959_b0255 article-title: Giving content to investor sentiment: The role of media in the stock market publication-title: The Journal of Finance doi: 10.1111/j.1540-6261.2007.01232.x – start-page: 689 year: 2008 ident: 10.1016/j.cie.2022.107959_b0070 article-title: Online methods for multi-domain learning and adaptation – start-page: 115 year: 2014 ident: 10.1016/j.cie.2022.107959_b0185 article-title: Automatic creation of stock market lexicons for sentiment analysis using StockTwits data – start-page: 841 year: 2004 ident: 10.1016/j.cie.2022.107959_b0095 article-title: Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis – ident: 10.1016/j.cie.2022.107959_b0010 – ident: 10.1016/j.cie.2022.107959_b0160 doi: 10.5772/6083 – volume: 32 start-page: 31 issue: 6 year: 2015 ident: 10.1016/j.cie.2022.107959_b0150 article-title: Playing with Duality: An overview of recent primal-dual approaches for solving large-scale optimization problems publication-title: IEEE Signal Processing Magazine doi: 10.1109/MSP.2014.2377273 – volume: 3 start-page: 2349 issue: 10 year: 2017 ident: 10.1016/j.cie.2022.107959_b0195 article-title: Literature Survey on Sentiment Analysis of Twitter Data using Machine Learning Approaches publication-title: International Journal for Innovation Research in Science & Technology (IJIRST) – year: 2020 ident: 10.1016/j.cie.2022.107959_b0075 article-title: Sparse Principal Component Analysis for Natural Language Processing publication-title: Annals of Data Science – volume: 37 start-page: 267 issue: 2 year: 2011 ident: 10.1016/j.cie.2022.107959_b0245 article-title: Lexicon-based methods for sentiment analysis publication-title: Computational Linguistics doi: 10.1162/COLI_a_00049 – volume: 2 start-page: 267 issue: 3 year: 2014 ident: 10.1016/j.cie.2022.107959_b0080 article-title: A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis publication-title: IEEE Transactions on Emerging Topics in Computing doi: 10.1109/TETC.2014.2330519 – volume: 5 start-page: 15650 year: 2017 ident: 10.1016/j.cie.2022.107959_b0005 article-title: A Review on Mobile SMS Spam Filtering Techniques publication-title: IEEE – start-page: 2 year: 2012 ident: 10.1016/j.cie.2022.107959_b0220 article-title: Alleviating data sparsity for twitter sentiment analysis – start-page: 79 year: 2002 ident: 10.1016/j.cie.2022.107959_b0190 article-title: Thumbs up? Sentiment classification using machine learning techniques – volume: 108 start-page: 42 year: 2016 ident: 10.1016/j.cie.2022.107959_b0210 article-title: Aspect extraction for opinion mining with a deep convolutional neural network publication-title: Knowledge Based Systems doi: 10.1016/j.knosys.2016.06.009 – start-page: 290 year: 2008 ident: 10.1016/j.cie.2022.107959_b0035 article-title: When specialists and generalists work together: Domain dependence in sentiment tagging – year: 2002 ident: 10.1016/j.cie.2022.107959_b0265 article-title: Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews – start-page: 151 year: 2011 ident: 10.1016/j.cie.2022.107959_b0125 article-title: Target-dependent twitter sentiment classification – volume: 56 start-page: 82 issue: 4 year: 2013 ident: 10.1016/j.cie.2022.107959_b0090 article-title: Techniques and applications for sentiment analysis publication-title: Communications of the ACM doi: 10.1145/2436256.2436274 – year: 2015 ident: 10.1016/j.cie.2022.107959_b0165 – volume: 5 start-page: 401 issue: 3 year: 2018 ident: 10.1016/j.cie.2022.107959_b0085 article-title: Youtube spam comment detection using support vector machine and K–nearest neighbor publication-title: Indonesian Journal of Electrical Engineering and Computer Science – year: 2018 ident: 10.1016/j.cie.2022.107959_b0240 article-title: A Hybrid Sampling Method Based on Safe Screening for Imbalanced Datasets with Sparse Structure publication-title: IEEE – start-page: 1 year: 2006 ident: 10.1016/j.cie.2022.107959_b0200 article-title: Contextual valence shifters – volume: 4 issue: 11 year: 2003 ident: 10.1016/j.cie.2022.107959_b0065 article-title: PreBIND and Textomy – mining the biomedical literature for protein-protein interactions using a support vector machine publication-title: BMC Bioinformatics – volume: 40 start-page: 4065 issue: 10 year: 2013 ident: 10.1016/j.cie.2022.107959_b0155 article-title: Ontology-based sentiment analysis of twitter posts publication-title: Expert Systems with Applications doi: 10.1016/j.eswa.2013.01.001 – start-page: 1458 year: 2011 ident: 10.1016/j.cie.2022.107959_b0230 article-title: Domain adaptation with ensemble of feature groups – start-page: 583 year: 2018 ident: 10.1016/j.cie.2022.107959_b0040 article-title: E-Mail Spam Filtering: A Review of Techniques and Trends – ident: 10.1016/j.cie.2022.107959_b0115 doi: 10.1109/ICAwST.2019.8923218 – volume: 5 start-page: 101 year: 2004 ident: 10.1016/j.cie.2022.107959_b0215 article-title: In Defense of One-Vs-All Classification publication-title: Journal of Machine Learning Research – start-page: 1 year: 2012 ident: 10.1016/j.cie.2022.107959_b0235 article-title: A comparative study of feature selection and machine learning techniques for sentiment analysis – start-page: 440 year: 2007 ident: 10.1016/j.cie.2022.107959_b0045 article-title: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification – ident: 10.1016/j.cie.2022.107959_b0205 – year: 2005 ident: 10.1016/j.cie.2022.107959_b9000 article-title: Customizing sentiment classifiers to new domains: A case study – start-page: 138 year: 2015 ident: 10.1016/j.cie.2022.107959_b0025 article-title: TubeSpam: Comment Spam Filtering on YouTube – start-page: 231 year: 2008 ident: 10.1016/j.cie.2022.107959_b0060 article-title: A holistic lexicon-based approach to opinion mining – ident: 10.1016/j.cie.2022.107959_b0130 – volume: 40 start-page: 6266 issue: 16 year: 2013 ident: 10.1016/j.cie.2022.107959_b0110 article-title: Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network publication-title: Expert Systems with Applications doi: 10.1016/j.eswa.2013.05.057 – volume: 155 start-page: 20 issue: 4 year: 2016 ident: 10.1016/j.cie.2022.107959_b0140 article-title: A Survey on Clustering Algorithms for Partitioning Method publication-title: International Journal of Computer Applications doi: 10.5120/ijca2016912291 – start-page: 30 year: 2011 ident: 10.1016/j.cie.2022.107959_b0015 article-title: Sentiment analysis of twitter data – ident: 10.1016/j.cie.2022.107959_b0175 – start-page: 197 year: 2018 ident: 10.1016/j.cie.2022.107959_b0100 article-title: A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach publication-title: Expert Systems with Application doi: 10.1016/j.eswa.2018.04.006 – volume: 148–152 year: 2015 ident: 10.1016/j.cie.2022.107959_b0180 article-title: A New Hierarchical Clustering Algorithm publication-title: International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS) – start-page: 508 year: 2012 ident: 10.1016/j.cie.2022.107959_b0225 article-title: Semantic sentiment analysis of twitter – start-page: 979 year: 2007 ident: 10.1016/j.cie.2022.107959_b0250 article-title: A novel scheme for domain-transfer problem in the context of sentiment analysis – start-page: 519 year: 2003 ident: 10.1016/j.cie.2022.107959_b0050 article-title: Mining the peanut gallery: Opinion extraction and semantic classification of product reviews – ident: 10.1016/j.cie.2022.107959_b0030 – start-page: 513 year: 2011 ident: 10.1016/j.cie.2022.107959_b0120 article-title: Domain adaptation for large-scale sentiment classification: A deep learning approach – volume: 22 start-page: 110 issue: 2 year: 2006 ident: 10.1016/j.cie.2022.107959_b0135 article-title: Sentiment classification of movie reviews using contextual valence shifters publication-title: Computational Intelligence doi: 10.1111/j.1467-8640.2006.00277.x – volume: 9 start-page: 1 issue: 3 year: 2016 ident: 10.1016/j.cie.2022.107959_b0260 article-title: A Survey on Clustering Techniques for Big Data Mining publication-title: Indian Journal of Science and Technology – volume: 8 issue: 4 year: 2008 ident: 10.1016/j.cie.2022.107959_b0020 article-title: A Powerful Feature Selection approach based on Mutual Information publication-title: International Journal of Computer Science and Network Security (IJCSNS)
SSID	ssj0004591
Score	2.4107392
Snippet	•Application of YAC2 clustering algorithm to textual data is presented.•Efficacy of the approach is measured against KNN, DBSCAN, and Spectral clustering...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	107959
SubjectTerms	Clustering Analysis Machine learning Sentiment analysis Spam filtering Transferability YAC2
Title	Sentiment analysis and spam filtering using the YAC2 clustering algorithm with transferability
URI	https://dx.doi.org/10.1016/j.cie.2022.107959
Volume	165
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) customDbUrl: eissn: 1879-0550 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004591 issn: 0360-8352 databaseCode: GBLVA dateStart: 20110101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: Science Direct customDbUrl: eissn: 1879-0550 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004591 issn: 0360-8352 databaseCode: .~1 dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect (Elsevier) customDbUrl: eissn: 1879-0550 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004591 issn: 0360-8352 databaseCode: ACRLP dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVESC databaseName: ScienceDirect Freedom Collection Journals customDbUrl: eissn: 1879-0550 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004591 issn: 0360-8352 databaseCode: AIKHN dateStart: 19950101 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1879-0550 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0004591 issn: 0360-8352 databaseCode: AKRWK dateStart: 19770101 isFulltext: true providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELYqWGDgUUCUR-WBCSk0dZyHx6qiKiC6lEplIbIduxT1pTYdWPjtnGMHFQkY2GLnHEUX5-6zffcdQlfc10HGMumBL6QeZUR4goMxjLUScUhiHQuTKPzYi7oDej8MhxXULnNhTFils_3WphfW2vU0nDYbi_G40Qfba_GDyTYhzCT8UhqbKgY3H80NxnBbNQ-EPSNdnmwWMV7wWFgiEgJtU3P7Z9-04W86B2jPAUXcsu9yiCpqVkX7DjRi90uuqmh3g1HwCL30TfSP2fHD3NGNwEWGwW5MsR6bo3GQwybafYQB--HnVptgOVmv3B0-Gc2X4_x1is0OLc4LXKuWls37_RgNOrdP7a7nSih4kpA49xTlfqI4VSokQkvAO-CjtRSBTiLlR6wpAp9kvq81rCM5ZxFvRoLJLNJxlAWcBCdoazafqVOEE8op00EkwK3ShAcsEyKJs0AbAh8ZJjXkl8pLpeMXN2UuJmkZSPYG_So1-k6tvmvo-mvIwpJr_CVMyy-SfpshKRj_34ed_W_YOdoxLRttdoG28uVaXQL8yEW9mF91tN26e-j2PgEDddp5
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED6VMgADb0R5emBCCk0dN4-xqqgKtF3aSmUhshO7BPWlNh1Y-O2cGwcVCRjYEvtsWRfn7rN9_g7ghtvKiYM4stAXMosFVFiCozH0lBRelXrKE_qicLvjNvvscVAdFKCe34XRYZXG9mc2fWWtTUnZaLM8S5JyF21vhh_0bRMa-BuwybB3vQK7-6isUYZnafNQ2tLi-dHmKsgL-8U1IqX4rpNu_-yc1hxOYx92DVIktWwwB1CQk0PYM6iRmH9ycQg7a5SCR_DS1eE_esuPcMM3gg8xQcMxJirRZ-MoR3S4-5Ag-CPPtTol0Wi5MDV8NJzOk_R1TPQWLUlXwFbOMzrv92PoN-579aZlcihYEaVeaknGbV9yJmWVChUh4EEnrSLhKN-VthtUhGPT2LaVwoUk54HLK64IothVnhs7nDonUJxMJ_IUiM84C5TjCvSrzOdOEAvhe7GjNINPVPVLYOfKCyNDMK7zXIzCPJLsDctlqPUdZvouwe1Xk1nGrvGXMMu_SPhtioRo_X9vdva_Ztew1ey1W2HrofN0Dtu6Jgs9u4BiOl_KS8QiqbhazbVPMzXcDg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sentiment+analysis+and+spam+filtering+using+the+YAC2+clustering+algorithm+with+transferability&rft.jtitle=Computers+%26+industrial+engineering&rft.au=Ghiassi%2C+M.&rft.au=Lee%2C+Sean&rft.au=Gaikwad%2C+Swati+Ramesh&rft.date=2022-03-01&rft.issn=0360-8352&rft.volume=165&rft.spage=107959&rft_id=info:doi/10.1016%2Fj.cie.2022.107959&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_cie_2022_107959
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0360-8352&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0360-8352&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0360-8352&client=summon