A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects

This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Departme...

Full description

Saved in:
Bibliographic Details
Published inJournal of open humanities data Vol. 9; p. 9
Main Authors Sibeko, Johannes, van Zaanen, Menno
Format Journal Article
LanguageEnglish
Published Ubiquity Press 05.07.2023
Subjects
Online AccessGet full text
ISSN2059-481X
2059-481X
DOI10.5334/johd.108

Cover

Abstract This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.
AbstractList This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made.
Author van Zaanen, Menno
Sibeko, Johannes
Author_xml – sequence: 1
  givenname: Johannes
  orcidid: 0000-0003-3586-7491
  surname: Sibeko
  fullname: Sibeko, Johannes
– sequence: 2
  givenname: Menno
  orcidid: 0000-0003-1841-2444
  surname: van Zaanen
  fullname: van Zaanen, Menno
BookMark eNp1kE1LAzEQhoMoWKvgT8hRD62Tr256LGqtUPBQBT0ts_lot2w3kqRY_71bK-LF08y88_Ac3jNy3IbWEXLJYKiEkDfrsLJDBvqI9Dio8UBq9nr8Zz8lFymtAYBrBgzGPZIm9A4z0oXLNHg6rVts6JvDSGf1ckUXZhVCQ-93uOk-uQ4tfXa7nPbsImzzik58rA22dBY2jmJrO0VMmU6srfd4Z5tju9zi0tHFtlo7k9M5OfHYJHfxM_vkZXr_fDsbzJ8eHm8n84ERvMgDBLDMFEoazeWYMWO59YU2WitlBUdQyumKycIDF8bJQhTeVyBGnCnN9Uj0yePBawOuy_dYbzB-lgHr8jsIcVlizLVpXKkqDyNwBRTIZeVt5YzTQjvORlJyIzvX9cG1bd_x8wOb5lfIoNyXX-7L7w7dsVcH1sSQUnT-f_QLPjqFyw
Cites_doi 10.1162/COLI_a_00255
10.1075/itl.165.2.01col
10.1080/02572117.2015.1113000
10.2989/16073614.2023.2185984
10.3233/JIFS-169489
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOA
DOI 10.5334/johd.108
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
EISSN 2059-481X
EndPage 9
ExternalDocumentID oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4
10.5334/johd.108
10_5334_johd_108
GroupedDBID .0O
AAFWJ
AAPRH
AAYXX
ACCQO
AFPKN
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
H13
HMHOC
IAO
IFM
ITC
M~E
ADTOC
UNPAY
ID FETCH-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863
IEDL.DBID UNPAY
ISSN 2059-481X
IngestDate Fri Oct 03 12:52:25 EDT 2025
Tue Aug 19 19:10:11 EDT 2025
Wed Oct 29 21:15:26 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863
ORCID 0000-0003-3586-7491
0000-0003-1841-2444
OpenAccessLink https://proxy.k.utb.cz/login?url=https://storage.googleapis.com/jnl-up-j-johd-files/journals/1/articles/108/64a557ab7cd99.pdf
PageCount 1
ParticipantIDs doaj_primary_oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4
unpaywall_primary_10_5334_johd_108
crossref_primary_10_5334_johd_108
PublicationCentury 2000
PublicationDate 20230705
PublicationDateYYYYMMDD 2023-07-05
PublicationDate_xml – month: 07
  year: 2023
  text: 20230705
  day: 05
PublicationDecade 2020
PublicationTitle Journal of open humanities data
PublicationYear 2023
Publisher Ubiquity Press
Publisher_xml – name: Ubiquity Press
References (key20230705074451_B13) 2021; 37
Department of Basic Education (key20230705074451_B6) 2011
key20230705074451_B8
(key20230705074451_B15) 2021; 03
(key20230705074451_B9) 2012
(key20230705074451_B1) 2014; 165
Department of Basic Education (key20230705074451_B5) 2011
(key20230705074451_B3) 2015
(key20230705074451_B16) 2018; 34
(key20230705074451_B14) 2023
(key20230705074451_B4) 2016; 42
(key20230705074451_B11) 2023; 41
(key20230705074451_B12) 2023
(key20230705074451_B2) 2014; 5
(key20230705074451_B10) 2015; 35
Department of Basic Education (key20230705074451_B7) 2011
References_xml – start-page: 36
  year: 2015
  ident: key20230705074451_B3
  article-title: Automatic text difficulty classifier: Assisting the selection of adequate reading materials for european portuguese teaching
– volume: 03
  start-page: 1
  issue: 1
  year: 2021
  ident: key20230705074451_B15
  article-title: An analysis of readability metrics on English exam texts
  publication-title: Journal of the Digital Humanities Association of Southern Africa
– start-page: 466
  year: 2012
  ident: key20230705074451_B9
  article-title: An «AI readability» formula for French as a foreign language
– start-page: 10
  volume-title: Curriculum and assessment policy statement: English second additional language grades
  year: 2011
  ident: key20230705074451_B7
– volume-title: Pirls 2021: International results in reading
  year: 2023
  ident: key20230705074451_B12
– volume: 42
  start-page: 457
  issue: 3
  year: 2016
  ident: key20230705074451_B4
  article-title: All mixed up? finding the optimal feature set for general readability prediction and its application to English and dutch
  publication-title: Computational Linguistics
  doi: 10.1162/COLI_a_00255
– volume: 165
  start-page: 97
  issue: 2
  year: 2014
  ident: key20230705074451_B1
  article-title: Computational assessment of text readability: A survey of current and future research
  publication-title: ITL-International Journal of Applied Linguistics
  doi: 10.1075/itl.165.2.01col
– start-page: 10
  volume-title: Curriculum and assessment policy statement: English home language grades
  year: 2011
  ident: key20230705074451_B6
– volume: 35
  start-page: 163
  issue: 2
  year: 2015
  ident: key20230705074451_B10
  article-title: Reading and the orthography of isiZulu
  publication-title: South African Journal of African Languages
  doi: 10.1080/02572117.2015.1113000
– ident: key20230705074451_B8
– volume: 41
  start-page: 76
  issue: 1
  year: 2023
  ident: key20230705074451_B11
  article-title: Merging English Home Language and First Additional Language curricula: Implications for future quality assurance practices
  publication-title: Southern African Linguistics and Applied Language Studies
  doi: 10.2989/16073614.2023.2185984
– volume: 34
  start-page: 3049
  issue: 5
  year: 2018
  ident: key20230705074451_B16
  article-title: Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics
  publication-title: Journal of intelligent and fuzzy systems
  doi: 10.3233/JIFS-169489
– start-page: 10
  volume-title: Curriculum and assessment policy statement: English first additional language grades
  year: 2011
  ident: key20230705074451_B5
– volume: 5
  start-page: 309
  issue: 1
  year: 2014
  ident: key20230705074451_B2
  article-title: Automatic readability classifier for european portuguese
  publication-title: System
– volume-title: Proceedings of the Fourth Workshop on Resources for African Indigenous Languages
  year: 2023
  ident: key20230705074451_B14
– volume: 37
  start-page: 50
  issue: 2
  year: 2021
  ident: key20230705074451_B13
  publication-title: Per Linguam
SSID ssj0002810109
Score 2.2270613
Snippet This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa...
SourceID doaj
unpaywall
crossref
SourceType Open Website
Open Access Repository
Index Database
StartPage 9
SubjectTerms examination texts
final year high school
indigenous languages
linguistic corpus
reading comprehension
summary writing
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6yi3oQRcX5i6d4LUvStMmOUzeGqBc3mKeSNAk6Rjdch_rfm9fOMQ_ixWtbkvK9NO9HXr-PkCtpKW9znUbcho88eAgdGalMFFIRh3xmxlc6ZA-PaX8o7kbJaE3qC3vCanrgGrhWYjxNqZNUai6Mt2FOp2LleIjzBc8rJlCq2mvJ1LgqGTE886nZZvFv09Z4-mKxoe6H_6lo-rfJ5qKY6c93PZms-ZbeLtlZBoXQqV9mj2y4Yp_MO3CrSw1ProSpBxTSncBzWJiAvRlQ02dC90NjMwvCC4Ow0c7x2UoXD2oJoAJQCR10YcMQIdSDjrWvdQEQ7pfFSgjbB9Zj5gdk2OsObvrRUiIhymMuy0hTalkuE5GrkEcxlltuvVS5UkliY65pkjhlmJA--ObcCRlL7w16LYbnZWl8SBrFtHBHBJiw0qfWWWaNcJ6GoZlnVEuNIQKPm-TiG7hsVjNhZCGDQHAzBBepRpvkGhFd3Ufu6upCsGi2tGj2l0Wb5HJlj19nOv6PmU7IFqrHV923ySlplG8LdxZijNKcV8vpC2E0z3U
  priority: 102
  providerName: Directory of Open Access Journals
Title A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
URI https://storage.googleapis.com/jnl-up-j-johd-files/journals/1/articles/108/64a557ab7cd99.pdf
https://doaj.org/article/5bf060e707a24bfdbece838e216442c4
UnpaywallVersion publishedVersion
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2059-481X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002810109
  issn: 2059-481X
  databaseCode: DOA
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2059-481X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002810109
  issn: 2059-481X
  databaseCode: M~E
  dateStart: 20150101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVHFC
  databaseName: Ubiquity Partner Network - Journals
  customDbUrl:
  eissn: 2059-481X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002810109
  issn: 2059-481X
  databaseCode: .0O
  dateStart: 20150929
  isFulltext: true
  titleUrlDefault: https://www.ubiquitypress.com/
  providerName: Ubiquity Press Ltd.
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bi9QwGA3r7IP64AUVZ9Uliq-ZNmnSZB5H3WURXQR3YBaEkuusY2nLtsOq4H83X9tZVBB88bUNaZt86XfJyTkIvZQuZXOmc8JcXOTRQ2hipDIkpiIe-MxM6HXI3p_mJ0v-diVWe-jT7iwMQALjQpqt63pdet2MHICbqiTbhmzIpr5wBCiL2mQc7DahyQ5EloCoTc61EFIbad18PmtcuIH2cxEj9QnaX55-WJyD3lyMKghXdDXw0cJ51AS6Bsjdbx6qJ_K_jW5uq0Z_u9Jl-Yv3Ob6LfuzeewCdfJltOzOz3_-gdPxfH3YP3RnDVrwYGt5He756gNoFfqM7jT_6DtcBg9Rvic_j0sGAHsEDwSc--qoBbgMGgM-iK2ihba_chweRogqDVjvWlYtdxGAUL5z7PJQo8buxnIrjDw4qRu1DtDw-Ont9QkYRB2IzJjui09RRKwW3KmZ6lFrHXJDKKiWEy5hOhfDKUC5DjB6s5zKTIRjwqxR29PLsEZpUdeUfI0y5kyF33lFnuA9p7JoGmmqpIYhh2RQ9301c0QxcHUXMcWByCxheIEOdolcwo9f3gV27v1BfrotxtAthQpqnXqZSM26Ci3buVaY8i7klZ5ZP0Ytre_jrkw7-pdETdAv063v8r3iKJt3l1j-LUU5nDvvqwOFosj8B4zgBYA
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bi9QwGA3r7IP64AWVHW9E8TXTJk2azOO4FxbRRXAHZkEouY7OlrZsO3gB_7v52s6iguCLr21I2-RLv_Olp-cg9Eq6lM2ZzglzcZHHDKGJkcqQWIp40DMzofche3eWny75m5VY7aGPu39hgBIYF9JsXdfr0utm1ADcVCXZNmRDNvUnR0CyqE3GwW4TmuxIZAmY2uRcCyG1kdbN57PGhRtoPxcRqU_Q_vLs_eIC_OYiqiBc0dWgRwv_oybQNVDufstQvZD_bXRzWzX62xddlr9kn5O76MfuvgfSyeVs25mZ_f6HpOP_erB76M4IW_FiaHgf7fnqAWoX-Eh3Gn_wHa4DBqvfEl_EpYOBPYIHgU98_FUD3QYCAJ_HVNBC2965Dw8mRRUGr3asKxe7iGAUL5z7PGxR4rfjdiqOLzjYMWofouXJ8fnhKRlNHIjNmOyITlNHrRTcqljpUWodc0Eqq5QQLmM6FcIrQ7kMET1Yz2UmQzCQVyl80cuzR2hS1ZU_QJhyJ0PuvKPOcB_S2DUNNNVSA4hh2RS92E1c0QxaHUWscWByCxheEEOdotcwo9fnQV27P1BfrYtxtAthQpqnXqZSM26Ci3HuVaY8i7UlZ5ZP0cvrePjrlR7_S6Mn6Bb41_f8X_EUTbqrrX8WUU5nno_B-hPgwwBr
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Data+Set+of+Final+Year+High+School+Examination+Texts+of+South+African+Home+and+First+Additional+Language+Subjects&rft.jtitle=Journal+of+open+humanities+data&rft.au=Johannes+Sibeko&rft.au=Menno+van+Zaanen&rft.date=2023-07-05&rft.pub=Ubiquity+Press&rft.eissn=2059-481X&rft.volume=9&rft.spage=9&rft.epage=9&rft_id=info:doi/10.5334%2Fjohd.108&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-481X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-481X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-481X&client=summon