A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects
This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Departme...
Saved in:
| Published in | Journal of open humanities data Vol. 9; p. 9 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Ubiquity Press
05.07.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2059-481X 2059-481X |
| DOI | 10.5334/johd.108 |
Cover
| Abstract | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made. |
|---|---|
| AbstractList | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa between 2008 and 2020. It contains texts for eleven official South African languages. PDF versions of the texts stem from South Africa’s Department of Basic Education’s online public access repository. Plain text is extracted from the PDFs and the texts are tokenized. The data set contains 429 full-text files with 929 manually extracted comprehension and summary writing texts. The data is useful for studies investigating, e.g., linguistic properties, text readability, text properties, and linguistic complexity in any of the eleven languages. Furthermore, both intra-language and inter-language comparisons or investigations can be made. |
| Author | van Zaanen, Menno Sibeko, Johannes |
| Author_xml | – sequence: 1 givenname: Johannes orcidid: 0000-0003-3586-7491 surname: Sibeko fullname: Sibeko, Johannes – sequence: 2 givenname: Menno orcidid: 0000-0003-1841-2444 surname: van Zaanen fullname: van Zaanen, Menno |
| BookMark | eNp1kE1LAzEQhoMoWKvgT8hRD62Tr256LGqtUPBQBT0ts_lot2w3kqRY_71bK-LF08y88_Ac3jNy3IbWEXLJYKiEkDfrsLJDBvqI9Dio8UBq9nr8Zz8lFymtAYBrBgzGPZIm9A4z0oXLNHg6rVts6JvDSGf1ckUXZhVCQ-93uOk-uQ4tfXa7nPbsImzzik58rA22dBY2jmJrO0VMmU6srfd4Z5tju9zi0tHFtlo7k9M5OfHYJHfxM_vkZXr_fDsbzJ8eHm8n84ERvMgDBLDMFEoazeWYMWO59YU2WitlBUdQyumKycIDF8bJQhTeVyBGnCnN9Uj0yePBawOuy_dYbzB-lgHr8jsIcVlizLVpXKkqDyNwBRTIZeVt5YzTQjvORlJyIzvX9cG1bd_x8wOb5lfIoNyXX-7L7w7dsVcH1sSQUnT-f_QLPjqFyw |
| Cites_doi | 10.1162/COLI_a_00255 10.1075/itl.165.2.01col 10.1080/02572117.2015.1113000 10.2989/16073614.2023.2185984 10.3233/JIFS-169489 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION ADTOC UNPAY DOA |
| DOI | 10.5334/johd.108 |
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2059-481X |
| EndPage | 9 |
| ExternalDocumentID | oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4 10.5334/johd.108 10_5334_johd_108 |
| GroupedDBID | .0O AAFWJ AAPRH AAYXX ACCQO AFPKN ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ H13 HMHOC IAO IFM ITC M~E ADTOC UNPAY |
| ID | FETCH-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863 |
| IEDL.DBID | UNPAY |
| ISSN | 2059-481X |
| IngestDate | Fri Oct 03 12:52:25 EDT 2025 Tue Aug 19 19:10:11 EDT 2025 Wed Oct 29 21:15:26 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| License | cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c327t-a00d1c754c824911cd2df78c8855d32a055e8b147f023ce4737ffb03621582863 |
| ORCID | 0000-0003-3586-7491 0000-0003-1841-2444 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://storage.googleapis.com/jnl-up-j-johd-files/journals/1/articles/108/64a557ab7cd99.pdf |
| PageCount | 1 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4 unpaywall_primary_10_5334_johd_108 crossref_primary_10_5334_johd_108 |
| PublicationCentury | 2000 |
| PublicationDate | 20230705 |
| PublicationDateYYYYMMDD | 2023-07-05 |
| PublicationDate_xml | – month: 07 year: 2023 text: 20230705 day: 05 |
| PublicationDecade | 2020 |
| PublicationTitle | Journal of open humanities data |
| PublicationYear | 2023 |
| Publisher | Ubiquity Press |
| Publisher_xml | – name: Ubiquity Press |
| References | (key20230705074451_B13) 2021; 37 Department of Basic Education (key20230705074451_B6) 2011 key20230705074451_B8 (key20230705074451_B15) 2021; 03 (key20230705074451_B9) 2012 (key20230705074451_B1) 2014; 165 Department of Basic Education (key20230705074451_B5) 2011 (key20230705074451_B3) 2015 (key20230705074451_B16) 2018; 34 (key20230705074451_B14) 2023 (key20230705074451_B4) 2016; 42 (key20230705074451_B11) 2023; 41 (key20230705074451_B12) 2023 (key20230705074451_B2) 2014; 5 (key20230705074451_B10) 2015; 35 Department of Basic Education (key20230705074451_B7) 2011 |
| References_xml | – start-page: 36 year: 2015 ident: key20230705074451_B3 article-title: Automatic text difficulty classifier: Assisting the selection of adequate reading materials for european portuguese teaching – volume: 03 start-page: 1 issue: 1 year: 2021 ident: key20230705074451_B15 article-title: An analysis of readability metrics on English exam texts publication-title: Journal of the Digital Humanities Association of Southern Africa – start-page: 466 year: 2012 ident: key20230705074451_B9 article-title: An «AI readability» formula for French as a foreign language – start-page: 10 volume-title: Curriculum and assessment policy statement: English second additional language grades year: 2011 ident: key20230705074451_B7 – volume-title: Pirls 2021: International results in reading year: 2023 ident: key20230705074451_B12 – volume: 42 start-page: 457 issue: 3 year: 2016 ident: key20230705074451_B4 article-title: All mixed up? finding the optimal feature set for general readability prediction and its application to English and dutch publication-title: Computational Linguistics doi: 10.1162/COLI_a_00255 – volume: 165 start-page: 97 issue: 2 year: 2014 ident: key20230705074451_B1 article-title: Computational assessment of text readability: A survey of current and future research publication-title: ITL-International Journal of Applied Linguistics doi: 10.1075/itl.165.2.01col – start-page: 10 volume-title: Curriculum and assessment policy statement: English home language grades year: 2011 ident: key20230705074451_B6 – volume: 35 start-page: 163 issue: 2 year: 2015 ident: key20230705074451_B10 article-title: Reading and the orthography of isiZulu publication-title: South African Journal of African Languages doi: 10.1080/02572117.2015.1113000 – ident: key20230705074451_B8 – volume: 41 start-page: 76 issue: 1 year: 2023 ident: key20230705074451_B11 article-title: Merging English Home Language and First Additional Language curricula: Implications for future quality assurance practices publication-title: Southern African Linguistics and Applied Language Studies doi: 10.2989/16073614.2023.2185984 – volume: 34 start-page: 3049 issue: 5 year: 2018 ident: key20230705074451_B16 article-title: Assessment of reading difficulty levels in Russian academic texts: Approaches and metrics publication-title: Journal of intelligent and fuzzy systems doi: 10.3233/JIFS-169489 – start-page: 10 volume-title: Curriculum and assessment policy statement: English first additional language grades year: 2011 ident: key20230705074451_B5 – volume: 5 start-page: 309 issue: 1 year: 2014 ident: key20230705074451_B2 article-title: Automatic readability classifier for european portuguese publication-title: System – volume-title: Proceedings of the Fourth Workshop on Resources for African Indigenous Languages year: 2023 ident: key20230705074451_B14 – volume: 37 start-page: 50 issue: 2 year: 2021 ident: key20230705074451_B13 publication-title: Per Linguam |
| SSID | ssj0002810109 |
| Score | 2.2270613 |
| Snippet | This article describes a data set of reading comprehension and summary writing texts that were used in final-year high school examinations in South Africa... |
| SourceID | doaj unpaywall crossref |
| SourceType | Open Website Open Access Repository Index Database |
| StartPage | 9 |
| SubjectTerms | examination texts final year high school indigenous languages linguistic corpus reading comprehension summary writing |
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3PS8MwFA6yi3oQRcX5i6d4LUvStMmOUzeGqBc3mKeSNAk6Rjdch_rfm9fOMQ_ixWtbkvK9NO9HXr-PkCtpKW9znUbcho88eAgdGalMFFIRh3xmxlc6ZA-PaX8o7kbJaE3qC3vCanrgGrhWYjxNqZNUai6Mt2FOp2LleIjzBc8rJlCq2mvJ1LgqGTE886nZZvFv09Z4-mKxoe6H_6lo-rfJ5qKY6c93PZms-ZbeLtlZBoXQqV9mj2y4Yp_MO3CrSw1ProSpBxTSncBzWJiAvRlQ02dC90NjMwvCC4Ow0c7x2UoXD2oJoAJQCR10YcMQIdSDjrWvdQEQ7pfFSgjbB9Zj5gdk2OsObvrRUiIhymMuy0hTalkuE5GrkEcxlltuvVS5UkliY65pkjhlmJA--ObcCRlL7w16LYbnZWl8SBrFtHBHBJiw0qfWWWaNcJ6GoZlnVEuNIQKPm-TiG7hsVjNhZCGDQHAzBBepRpvkGhFd3Ufu6upCsGi2tGj2l0Wb5HJlj19nOv6PmU7IFqrHV923ySlplG8LdxZijNKcV8vpC2E0z3U priority: 102 providerName: Directory of Open Access Journals |
| Title | A Data Set of Final Year High School Examination Texts of South African Home and First Additional Language Subjects |
| URI | https://storage.googleapis.com/jnl-up-j-johd-files/journals/1/articles/108/64a557ab7cd99.pdf https://doaj.org/article/5bf060e707a24bfdbece838e216442c4 |
| UnpaywallVersion | publishedVersion |
| Volume | 9 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2059-481X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002810109 issn: 2059-481X databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2059-481X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002810109 issn: 2059-481X databaseCode: M~E dateStart: 20150101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVHFC databaseName: Ubiquity Partner Network - Journals customDbUrl: eissn: 2059-481X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002810109 issn: 2059-481X databaseCode: .0O dateStart: 20150929 isFulltext: true titleUrlDefault: https://www.ubiquitypress.com/ providerName: Ubiquity Press Ltd. |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bi9QwGA3r7IP64AUVZ9Uliq-ZNmnSZB5H3WURXQR3YBaEkuusY2nLtsOq4H83X9tZVBB88bUNaZt86XfJyTkIvZQuZXOmc8JcXOTRQ2hipDIkpiIe-MxM6HXI3p_mJ0v-diVWe-jT7iwMQALjQpqt63pdet2MHICbqiTbhmzIpr5wBCiL2mQc7DahyQ5EloCoTc61EFIbad18PmtcuIH2cxEj9QnaX55-WJyD3lyMKghXdDXw0cJ51AS6Bsjdbx6qJ_K_jW5uq0Z_u9Jl-Yv3Ob6LfuzeewCdfJltOzOz3_-gdPxfH3YP3RnDVrwYGt5He756gNoFfqM7jT_6DtcBg9Rvic_j0sGAHsEDwSc--qoBbgMGgM-iK2ihba_chweRogqDVjvWlYtdxGAUL5z7PJQo8buxnIrjDw4qRu1DtDw-Ont9QkYRB2IzJjui09RRKwW3KmZ6lFrHXJDKKiWEy5hOhfDKUC5DjB6s5zKTIRjwqxR29PLsEZpUdeUfI0y5kyF33lFnuA9p7JoGmmqpIYhh2RQ9301c0QxcHUXMcWByCxheIEOdolcwo9f3gV27v1BfrotxtAthQpqnXqZSM26Ci3buVaY8i7klZ5ZP0Ytre_jrkw7-pdETdAv063v8r3iKJt3l1j-LUU5nDvvqwOFosj8B4zgBYA |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bi9QwGA3r7IP64AWVHW9E8TXTJk2azOO4FxbRRXAHZkEouY7OlrZsO3gB_7v52s6iguCLr21I2-RLv_Olp-cg9Eq6lM2ZzglzcZHHDKGJkcqQWIp40DMzofche3eWny75m5VY7aGPu39hgBIYF9JsXdfr0utm1ADcVCXZNmRDNvUnR0CyqE3GwW4TmuxIZAmY2uRcCyG1kdbN57PGhRtoPxcRqU_Q_vLs_eIC_OYiqiBc0dWgRwv_oybQNVDufstQvZD_bXRzWzX62xddlr9kn5O76MfuvgfSyeVs25mZ_f6HpOP_erB76M4IW_FiaHgf7fnqAWoX-Eh3Gn_wHa4DBqvfEl_EpYOBPYIHgU98_FUD3QYCAJ_HVNBC2965Dw8mRRUGr3asKxe7iGAUL5z7PGxR4rfjdiqOLzjYMWofouXJ8fnhKRlNHIjNmOyITlNHrRTcqljpUWodc0Eqq5QQLmM6FcIrQ7kMET1Yz2UmQzCQVyl80cuzR2hS1ZU_QJhyJ0PuvKPOcB_S2DUNNNVSA4hh2RS92E1c0QxaHUWscWByCxheEEOdotcwo9fnQV27P1BfrYtxtAthQpqnXqZSM26Ci3HuVaY8i7UlZ5ZP0cvrePjrlR7_S6Mn6Bb41_f8X_EUTbqrrX8WUU5nno_B-hPgwwBr |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Data+Set+of+Final+Year+High+School+Examination+Texts+of+South+African+Home+and+First+Additional+Language+Subjects&rft.jtitle=Journal+of+open+humanities+data&rft.au=Johannes+Sibeko&rft.au=Menno+van+Zaanen&rft.date=2023-07-05&rft.pub=Ubiquity+Press&rft.eissn=2059-481X&rft.volume=9&rft.spage=9&rft.epage=9&rft_id=info:doi/10.5334%2Fjohd.108&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_5bf060e707a24bfdbece838e216442c4 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2059-481X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2059-481X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2059-481X&client=summon |