GREEK-BERT: The Greeks visiting Sesame Street
Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich Engl...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
03.09.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 2331-8422 |
DOI | 10.48550/arxiv.2008.12014 |
Cover
Abstract | Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek. |
---|---|
AbstractList | Transformer-based language models, such as BERT and its variants, have
achieved state-of-the-art performance in several downstream natural language
processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE).
However, these models have mostly been applied to the resource-rich English
language. In this paper, we present GREEK-BERT, a monolingual BERT-based
language model for modern Greek. We evaluate its performance in three NLP
tasks, i.e., part-of-speech tagging, named entity recognition, and natural
language inference, obtaining state-of-the-art performance. Interestingly, in
two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based
models (M-BERT, XLM-R), as well as shallower neural baselines operating on
pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we
make both GREEK-BERT and our training code publicly available, along with code
illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We
expect these resources to boost NLP research and applications for modern Greek. Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek. |
Author | Malakasiotis, Prodromos Androutsopoulos, Ion Koutsikakis, John Chalkidis, Ilias |
Author_xml | – sequence: 1 givenname: John surname: Koutsikakis fullname: Koutsikakis, John – sequence: 2 givenname: Ilias surname: Chalkidis fullname: Chalkidis, Ilias – sequence: 3 givenname: Prodromos surname: Malakasiotis fullname: Malakasiotis, Prodromos – sequence: 4 givenname: Ion surname: Androutsopoulos fullname: Androutsopoulos, Ion |
BackLink | https://doi.org/10.48550/arXiv.2008.12014$$DView paper in arXiv https://doi.org/10.1145/3411408.3411440$$DView published paper (Access to full text may be restricted) |
BookMark | eNotj1FLwzAYRYMoOOd-gE8WfE5NviRN4puOWsWBsPW9pE2ina6dSTf031s3ny5cDpd7LtBp13cOoStKUq6EILcmfLf7FAhRKQVC-QmaAGMUKw5wjmYxrgkhkEkQgk0QLpZ5_oIf8mV5l5TvLimCcx8x2bexHdruLVm5aDYuWQ1jP1yiM28-o5v95xSVj3k5f8KL1-J5fr_ARoDGnlvJBSPagvIgmPHWSKUtbaxoPEiWGckIaFprJzNqZOMpr4UBkJ7V2rIpuj7OHlyqbWg3JvxUf07VwWkkbo7ENvRfOxeHat3vQjd-qoAzRRkXWrNfNlpNsg |
ContentType | Paper Journal Article |
Copyright | 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX |
DOI | 10.48550/arxiv.2008.12014 |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Computer Science arXiv.org |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
ExternalDocumentID | 2008_12014 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKY GOX |
ID | FETCH-LOGICAL-a529-f4d745309d28f253afda789d1cd5cf2736a730291b9e761a7cf14b5a227f3b9d3 |
IEDL.DBID | GOX |
IngestDate | Wed Jul 23 01:58:41 EDT 2025 Mon Jun 30 09:27:49 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a529-f4d745309d28f253afda789d1cd5cf2736a730291b9e761a7cf14b5a227f3b9d3 |
Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
OpenAccessLink | https://arxiv.org/abs/2008.12014 |
PQID | 2438134599 |
PQPubID | 2050157 |
ParticipantIDs | arxiv_primary_2008_12014 proquest_journals_2438134599 |
PublicationCentury | 2000 |
PublicationDate | 20200903 |
PublicationDateYYYYMMDD | 2020-09-03 |
PublicationDate_xml | – month: 09 year: 2020 text: 20200903 day: 03 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2020 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 1.7375976 |
SecondaryResourceType | preprint |
Snippet | Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing... Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing... |
SourceID | arxiv proquest |
SourceType | Open Access Repository Aggregation Database |
SubjectTerms | Benchmarks Computer Science - Computation and Language English language Language Natural language Natural language processing Performance evaluation Speech recognition Transformers |
SummonAdditionalLinks | – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA7aInjzSatV9uA1tJkkm40XQdm2KIrUCr2VZJOAiG3dreLPN8lu9SB4TU6ZmczzmxmELkJ_tY-TNaYOfIBijcHaCY6VMJnncqYIC43C9w_p-JndzvisSbhVDaxyoxOjojbLIuTI-xBmUVHGpbxaveOwNSpUV5sVGtuoTcDb2tApPhz95FggFd5jpnUxM47u6qvy6-WzhlASb_uY90nj0R9VHO3LcA-1H9XKlvtoyy4O0E6EZRbVIcKjSZ7f4et8Mr1MPEeTAJN5rZLQER7wysmTrdSbTera8hGaDvPpzRg3Cw6w4iCxY0YwTgfSQOaAU-WMEpk0pDC8cN6vSJX_fyCJllakRInCEaa5AhCOamnoMWotlgvbQckg08DBpcoLBpOCa5E67ziwdFD4kAZUF3XiM-ereoZFvX0yUqCLepuXzxv5rea_1D75__oU7UKIQEOJhfZQa11-2DNvptf6PPLiG8Qijnc priority: 102 providerName: ProQuest |
Title | GREEK-BERT: The Greeks visiting Sesame Street |
URI | https://www.proquest.com/docview/2438134599 https://arxiv.org/abs/2008.12014 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7aevEiikrVWnLwGuzmsdl4s7JtUVqlVuitJJsNiCjSreLJ3-4ku8WDeMkhTA6TL8nMxzwCcBHqq5EnW8o9Q4JSOketV5Ia5TJEOTOJCIXC01k6eRK3S7lsAdnWwpj11_Nn3R_YVpcx1TFBGyXa0GYskKvx_bIOTsZWXI38rxz6mHHqz9Ma7cVoH_YaR49c18gcQKt8OwQ6nuf5HR3m88UVQYRISHt5qUio8A75x-SxrMxrSepY8REsRvniZkKbDwuokUxTL5wSkg-0Y5lnkhvvjMq0SwonC49-QmrwPjGdWF2qNDGq8Imw0qBinlvt-DF0kPOXXSCDzDLJfGoQaKGVtCr16AiIdFAgRWHmBLpRzdV73ZOi_k0y7sAJ9Laar5rzWK1Y6OTFhdT69P-VZ7DLApsM4RLeg85m_VGeo8nd2D60s9G4DzvDfPYw70cUcJx-5z_53YFh |
linkProvider | Cornell University |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB7UInrzidWqOehxsdlHNiuIoLZWqyK1Qm9hk90FEdva-PxR_kdnk1YPgrdeEwhMZnZmvp1vZgD2fH814uSUMEcRoFhjSOqkIFqaGLUc65D7RuHrm6h1zy97ojcDX5NeGE-rnPjEwlGbQebvyA-on0XFuFDqePhM_NYoX12drNAozaJtP98RsuVHF2eo331Km43uaYuMtwoQLagijhvJBasrQ2NHBdPOaBkrE2ZGZA6DeaTR6KkKU2UR4muZuZCnQlMqHUuVYfjZWahwxphnEMbN858rHRpJTNBZWTstJoUd6NHHw1vJ2Awx1HJMgYtHfzx_Ec6aS1C51UM7WoYZ21-B-YIFmuWrQM47jUabnDQ63cMADSjwrJzHPPAN6J4eHdzZXD_ZoCxlr0F3GpKvw1x_0LcbENTjlArqIo12yJUUqYwc5ik8qmeIoKiuwkYhZjIsR2aUyy6LP1CF2kTyZHxc8uRXuZv_v96FhVb3-iq5urhpb8Ei9eDXV3dYDeZeRq92GzOEl3Sn0EsAyZTt4BucbMpJ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GREEK-BERT%3A+The+Greeks+visiting+Sesame+Street&rft.jtitle=arXiv.org&rft.au=Koutsikakis%2C+John&rft.au=Chalkidis%2C+Ilias&rft.au=Malakasiotis%2C+Prodromos&rft.au=Androutsopoulos%2C+Ion&rft.date=2020-09-03&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2008.12014 |