Designing Punjabi poetry classifiers using machine learning and different textual features
Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, se...
Saved in:
| Published in | International arab journal of information technology Vol. 17; no. 1; pp. 38 - 44 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Zarqa, Jordan
Zarqa University, Deanship of Scientific Research
2020
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1683-3198 2309-4524 1683-3198 |
| DOI | 10.34028/iajit/17/1/5 |
Cover
| Abstract | Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of
literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be
classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi
poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems
Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual
(PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various
pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938
extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF)
weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were
experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine,
Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic
feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy
(76.02%) is achieved by incorporating semantic information associated with words. |
|---|---|
| AbstractList | Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual (PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938 extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy (76.02%) is achieved by incorporating semantic information associated with words. Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual (PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938 extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy (76.02%) is achieved by incorporating semantic information associated with words. |
| Author | Kaur, Jasleen Saini, Jatinderkumar |
| Author_xml | – sequence: 1 fullname: Kaur, Jasleen – sequence: 2 fullname: Saini, Jatinderkumar |
| BookMark | eNqFkE1LAzEQhoNUsNYePQr7B7abj81ucpT6CQU96MVLyGYnNWWbLUkW7b-3tiIiiHN55_A8M_CeopHvPSB0TvCMlZiKwumVSwWpC1LwIzQmlWA5I1KMfuwnaBrjCu-GSVrV9Ri9XEF0S-_8Mnsc_Eo3Ltv0kMI2M52O0VkHIWZD_ATW2rw6D1kHOuwN7dusddZCAJ-yBO9p0F1mQachQDxDx1Z3EaZfOUHPN9dP87t88XB7P79c5IZKkXLOQGJOiQZLsCmZaZrGCsEqaitelpLZGqikAlNLcWMq1hLDuWHYVrItS8omaHa4O_iN3r7prlOb4NY6bBXBal-O2pejSK2I4juBHQQT-hgDWGVc0sn1PgXtuj-t_Jf135eLAw87CKz-xiXnpBTsA23nhqo |
| CitedBy_id | crossref_primary_10_1145_3589249 crossref_primary_10_3389_fbioe_2020_00267 crossref_primary_10_1016_j_matpr_2022_05_297 |
| ContentType | Journal Article |
| DBID | ADJCN AHFXO AAYXX CITATION ADTOC UNPAY |
| DOI | 10.34028/iajit/17/1/5 |
| DatabaseName | الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISSN | 1683-3198 |
| EndPage | 44 |
| ExternalDocumentID | 10.34028/iajit/17/1/5 10_34028_iajit_17_1_5 955148 |
| GroupedDBID | .4S .DC 5GY AAKPC ADJCN AENEX AFWDF AHFXO ALMA_UNASSIGNED_HOLDINGS ARCSS E3Z EBS EDO EJD EOJEC KQ8 MK~ ML~ OBODZ OK1 P2P TR2 TUS ~A~ AAYXX CITATION ADTOC UNPAY |
| ID | FETCH-LOGICAL-c298t-53e90521aef10c43cbbbf88362f654493f7e292802f20bc63d1c55c30f69d4423 |
| IEDL.DBID | UNPAY |
| ISSN | 1683-3198 2309-4524 |
| IngestDate | Tue Aug 19 22:48:18 EDT 2025 Tue Jul 01 02:06:48 EDT 2025 Thu Apr 24 23:01:50 EDT 2025 Tue Nov 26 17:10:02 EST 2024 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| LCCallNum_Ident | T |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c298t-53e90521aef10c43cbbbf88362f654493f7e292802f20bc63d1c55c30f69d4423 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.34028/iajit/17/1/5 |
| PageCount | 7 |
| ParticipantIDs | unpaywall_primary_10_34028_iajit_17_1_5 crossref_citationtrail_10_34028_iajit_17_1_5 crossref_primary_10_34028_iajit_17_1_5 emarefa_primary_955148 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2020. 2020-1-1 |
| PublicationDateYYYYMMDD | 2020-01-01 |
| PublicationDate_xml | – year: 2020 text: 2020. |
| PublicationDecade | 2020 |
| PublicationPlace | Zarqa, Jordan |
| PublicationPlace_xml | – name: Zarqa, Jordan |
| PublicationTitle | International arab journal of information technology |
| PublicationYear | 2020 |
| Publisher | Zarqa University, Deanship of Scientific Research |
| Publisher_xml | – name: Zarqa University, Deanship of Scientific Research |
| SSID | ssj0000392677 |
| Score | 2.2540562 |
| Snippet | Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of
literary arts, especially poetry, is very... Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very... |
| SourceID | unpaywall crossref emarefa |
| SourceType | Open Access Repository Enrichment Source Index Database Publisher |
| StartPage | 38 |
| SubjectTerms | التصنيف الخوارزميات |
| Title | Designing Punjabi poetry classifiers using machine learning and different textual features |
| URI | https://search.emarefa.net/detail/BIM-955148 https://doi.org/10.34028/iajit/17/1/5 |
| UnpaywallVersion | publishedVersion |
| Volume | 17 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1683-3198 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000392677 issn: 1683-3198 databaseCode: KQ8 dateStart: 20030101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PSyMxFH5oZREP_i5W1pLDsl4cm5_TzFHUIi6VChZcL0OSSaS1TkVmWNy_fpPpWLqKu95yeJMJLyHve-S97wP4hnXX4cywyFLCfYIik0gJzCJJVBZbwixXoTm5fxVfDPnlrbiti2hCL8zC-z3zqY3sjNR4VHRIt0M6YhlWYuEhdwNWhleDk58hmYpl6A-rNG9peC3ggvIZl-b77_-KPV_so_IDH5BWy_xJvfxSk8lCaOltQO91UbOKkofjstDH5vcbvsb_rnoT1mtwiU5mp2ELlmy-DWsLlIM7cHdWlWz4MRqU-VjpERpMbfH8gip5zJEL0tioKiRA_arQ0qKag_UeqTxDZ7WiSoFu_L1e-t8FFFn6rH0Xhr3zm9OLqNZXiAxNZBEJZpPQu6usI9hwZrTWTkof0lwsOE-Y61qaUImpo1ibmGXECGEYdnGScY_DmtDIp7ndAyS5xdgFeCMCZxrVyt8dsSaJ1ZwzyVtw9Or51NTk40EDY5L6JKRyWVq5LCXdlKSiBd_n5k8z1o2PDJv1Ns7tkoABZQsO57v67yn2P235FRrFc2kPPAopdBuWf1zLdn0S_wBIXtry |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5aEfHgW6yo5CB6cW2e2-yxqEWESg8tVC9Lkk2kta6l7CL115tst6UqPm45zGbDJGS-ITPfB8ApUnWLEk0DQzBzCYqIAskRDQSWSWgwNUz65uTWfXjbZXc93iuLaHwvzML7PXWpjaj15aCf1XC9hmt8GayE3EHuCljp3rcbDz6ZCoXvDys0b4l_LWCcsCmX5vfvP8WeVfMi3cAFpLU8HcnJmxwOF0JLcxM0Z4uaVpQ8X-aZutTvX_ga_1z1FtgowSVsTE_DNlgy6Q5YX6Ac3AWP10XJhhvDdp4OpOrD9qvJxhNYyGP2rZfGhkUhAWwVhZYGlhysT1CmCbwuFVUy2HH3eu5-51Fk7rL2PdBt3nSuboNSXyHQJBJZwKmJfO-uNBYjzahWSlkhXEizIWcsorZuSEQEIpYgpUOaYM25psiGUcIcDtsHlfQ1NQcACmYQsh7ecM-ZRpR0d0eocGQUY1SwKriYeT7WJfm418AYxi4JKVwWFy6LcT3GMa-Cs7n5aMq68ZPhfrmNc7vIY0BRBefzXf19isN_Wx6BSjbOzbFDIZk6Kc_gBywi2f0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Designing+Punjabi+Poetry+Classifiers+Using+Machine+Learning+and+Different+Textual+Features&rft.jtitle=International+arab+journal+of+information+technology&rft.au=Kaur%2C+Jasleen&rft.au=Saini%2C+Jatinderkumar&rft.date=2020-01-01&rft.issn=1683-3198&rft.eissn=1683-3198&rft.spage=38&rft.epage=44&rft_id=info:doi/10.34028%2Fiajit%2F17%2F1%2F5&rft.externalDBID=n%2Fa&rft.externalDocID=10_34028_iajit_17_1_5 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1683-3198&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1683-3198&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1683-3198&client=summon |