Designing Punjabi poetry classifiers using machine learning and different textual features

Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, se...

Full description

Saved in:
Bibliographic Details
Published inInternational arab journal of information technology Vol. 17; no. 1; pp. 38 - 44
Main Authors Kaur, Jasleen, Saini, Jatinderkumar
Format Journal Article
LanguageEnglish
Published Zarqa, Jordan Zarqa University, Deanship of Scientific Research 2020
Subjects
Online AccessGet full text
ISSN1683-3198
2309-4524
1683-3198
DOI10.34028/iajit/17/1/5

Cover

Abstract Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual (PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938 extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy (76.02%) is achieved by incorporating semantic information associated with words.
AbstractList Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual (PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938 extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy (76.02%) is achieved by incorporating semantic information associated with words.
Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual (PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938 extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy (76.02%) is achieved by incorporating semantic information associated with words.
Author Kaur, Jasleen
Saini, Jatinderkumar
Author_xml – sequence: 1
  fullname: Kaur, Jasleen
– sequence: 2
  fullname: Saini, Jatinderkumar
BookMark eNqFkE1LAzEQhoNUsNYePQr7B7abj81ucpT6CQU96MVLyGYnNWWbLUkW7b-3tiIiiHN55_A8M_CeopHvPSB0TvCMlZiKwumVSwWpC1LwIzQmlWA5I1KMfuwnaBrjCu-GSVrV9Ri9XEF0S-_8Mnsc_Eo3Ltv0kMI2M52O0VkHIWZD_ATW2rw6D1kHOuwN7dusddZCAJ-yBO9p0F1mQachQDxDx1Z3EaZfOUHPN9dP87t88XB7P79c5IZKkXLOQGJOiQZLsCmZaZrGCsEqaitelpLZGqikAlNLcWMq1hLDuWHYVrItS8omaHa4O_iN3r7prlOb4NY6bBXBal-O2pejSK2I4juBHQQT-hgDWGVc0sn1PgXtuj-t_Jf135eLAw87CKz-xiXnpBTsA23nhqo
CitedBy_id crossref_primary_10_1145_3589249
crossref_primary_10_3389_fbioe_2020_00267
crossref_primary_10_1016_j_matpr_2022_05_297
ContentType Journal Article
DBID ADJCN
AHFXO
AAYXX
CITATION
ADTOC
UNPAY
DOI 10.34028/iajit/17/1/5
DatabaseName الدوريات العلمية والإحصائية - e-Marefa Academic and Statistical Periodicals
معرفة - المحتوى العربي الأكاديمي المتكامل - e-Marefa Academic Complete
CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1683-3198
EndPage 44
ExternalDocumentID 10.34028/iajit/17/1/5
10_34028_iajit_17_1_5
955148
GroupedDBID .4S
.DC
5GY
AAKPC
ADJCN
AENEX
AFWDF
AHFXO
ALMA_UNASSIGNED_HOLDINGS
ARCSS
E3Z
EBS
EDO
EJD
EOJEC
KQ8
MK~
ML~
OBODZ
OK1
P2P
TR2
TUS
~A~
AAYXX
CITATION
ADTOC
UNPAY
ID FETCH-LOGICAL-c298t-53e90521aef10c43cbbbf88362f654493f7e292802f20bc63d1c55c30f69d4423
IEDL.DBID UNPAY
ISSN 1683-3198
2309-4524
IngestDate Tue Aug 19 22:48:18 EDT 2025
Tue Jul 01 02:06:48 EDT 2025
Thu Apr 24 23:01:50 EDT 2025
Tue Nov 26 17:10:02 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
LCCallNum_Ident T
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c298t-53e90521aef10c43cbbbf88362f654493f7e292802f20bc63d1c55c30f69d4423
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.34028/iajit/17/1/5
PageCount 7
ParticipantIDs unpaywall_primary_10_34028_iajit_17_1_5
crossref_citationtrail_10_34028_iajit_17_1_5
crossref_primary_10_34028_iajit_17_1_5
emarefa_primary_955148
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020.
2020-1-1
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 2020.
PublicationDecade 2020
PublicationPlace Zarqa, Jordan
PublicationPlace_xml – name: Zarqa, Jordan
PublicationTitle International arab journal of information technology
PublicationYear 2020
Publisher Zarqa University, Deanship of Scientific Research
Publisher_xml – name: Zarqa University, Deanship of Scientific Research
SSID ssj0000392677
Score 2.2540562
Snippet Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very...
Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very...
SourceID unpaywall
crossref
emarefa
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 38
SubjectTerms التصنيف
الخوارزميات
Title Designing Punjabi poetry classifiers using machine learning and different textual features
URI https://search.emarefa.net/detail/BIM-955148
https://doi.org/10.34028/iajit/17/1/5
UnpaywallVersion publishedVersion
Volume 17
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1683-3198
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000392677
  issn: 1683-3198
  databaseCode: KQ8
  dateStart: 20030101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PSyMxFH5oZREP_i5W1pLDsl4cm5_TzFHUIi6VChZcL0OSSaS1TkVmWNy_fpPpWLqKu95yeJMJLyHve-S97wP4hnXX4cywyFLCfYIik0gJzCJJVBZbwixXoTm5fxVfDPnlrbiti2hCL8zC-z3zqY3sjNR4VHRIt0M6YhlWYuEhdwNWhleDk58hmYpl6A-rNG9peC3ggvIZl-b77_-KPV_so_IDH5BWy_xJvfxSk8lCaOltQO91UbOKkofjstDH5vcbvsb_rnoT1mtwiU5mp2ELlmy-DWsLlIM7cHdWlWz4MRqU-VjpERpMbfH8gip5zJEL0tioKiRA_arQ0qKag_UeqTxDZ7WiSoFu_L1e-t8FFFn6rH0Xhr3zm9OLqNZXiAxNZBEJZpPQu6usI9hwZrTWTkof0lwsOE-Y61qaUImpo1ibmGXECGEYdnGScY_DmtDIp7ndAyS5xdgFeCMCZxrVyt8dsSaJ1ZwzyVtw9Or51NTk40EDY5L6JKRyWVq5LCXdlKSiBd_n5k8z1o2PDJv1Ns7tkoABZQsO57v67yn2P235FRrFc2kPPAopdBuWf1zLdn0S_wBIXtry
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5aEfHgW6yo5CB6cW2e2-yxqEWESg8tVC9Lkk2kta6l7CL115tst6UqPm45zGbDJGS-ITPfB8ApUnWLEk0DQzBzCYqIAskRDQSWSWgwNUz65uTWfXjbZXc93iuLaHwvzML7PXWpjaj15aCf1XC9hmt8GayE3EHuCljp3rcbDz6ZCoXvDys0b4l_LWCcsCmX5vfvP8WeVfMi3cAFpLU8HcnJmxwOF0JLcxM0Z4uaVpQ8X-aZutTvX_ga_1z1FtgowSVsTE_DNlgy6Q5YX6Ac3AWP10XJhhvDdp4OpOrD9qvJxhNYyGP2rZfGhkUhAWwVhZYGlhysT1CmCbwuFVUy2HH3eu5-51Fk7rL2PdBt3nSuboNSXyHQJBJZwKmJfO-uNBYjzahWSlkhXEizIWcsorZuSEQEIpYgpUOaYM25psiGUcIcDtsHlfQ1NQcACmYQsh7ecM-ZRpR0d0eocGQUY1SwKriYeT7WJfm418AYxi4JKVwWFy6LcT3GMa-Cs7n5aMq68ZPhfrmNc7vIY0BRBefzXf19isN_Wx6BSjbOzbFDIZk6Kc_gBywi2f0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Designing+Punjabi+Poetry+Classifiers+Using+Machine+Learning+and+Different+Textual+Features&rft.jtitle=International+arab+journal+of+information+technology&rft.au=Kaur%2C+Jasleen&rft.au=Saini%2C+Jatinderkumar&rft.date=2020-01-01&rft.issn=1683-3198&rft.eissn=1683-3198&rft.spage=38&rft.epage=44&rft_id=info:doi/10.34028%2Fiajit%2F17%2F1%2F5&rft.externalDBID=n%2Fa&rft.externalDocID=10_34028_iajit_17_1_5
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1683-3198&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1683-3198&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1683-3198&client=summon