GREEK-BERT: The Greeks visiting Sesame Street

Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich Engl...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Koutsikakis, John, Chalkidis, Ilias, Malakasiotis, Prodromos, Androutsopoulos, Ion
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 03.09.2020
Subjects
Online AccessGet full text
ISSN2331-8422
DOI10.48550/arxiv.2008.12014

Cover

Abstract Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek.
AbstractList Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek.
Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing (NLP) tasks on generic benchmark datasets (e.g., GLUE, SQUAD, RACE). However, these models have mostly been applied to the resource-rich English language. In this paper, we present GREEK-BERT, a monolingual BERT-based language model for modern Greek. We evaluate its performance in three NLP tasks, i.e., part-of-speech tagging, named entity recognition, and natural language inference, obtaining state-of-the-art performance. Interestingly, in two of the benchmarks GREEK-BERT outperforms two multilingual Transformer-based models (M-BERT, XLM-R), as well as shallower neural baselines operating on pre-trained word embeddings, by a large margin (5%-10%). Most importantly, we make both GREEK-BERT and our training code publicly available, along with code illustrating how GREEK-BERT can be fine-tuned for downstream NLP tasks. We expect these resources to boost NLP research and applications for modern Greek.
Author Malakasiotis, Prodromos
Androutsopoulos, Ion
Koutsikakis, John
Chalkidis, Ilias
Author_xml – sequence: 1
  givenname: John
  surname: Koutsikakis
  fullname: Koutsikakis, John
– sequence: 2
  givenname: Ilias
  surname: Chalkidis
  fullname: Chalkidis, Ilias
– sequence: 3
  givenname: Prodromos
  surname: Malakasiotis
  fullname: Malakasiotis, Prodromos
– sequence: 4
  givenname: Ion
  surname: Androutsopoulos
  fullname: Androutsopoulos, Ion
BackLink https://doi.org/10.48550/arXiv.2008.12014$$DView paper in arXiv
https://doi.org/10.1145/3411408.3411440$$DView published paper (Access to full text may be restricted)
BookMark eNotj1FLwzAYRYMoOOd-gE8WfE5NviRN4puOWsWBsPW9pE2ina6dSTf031s3ny5cDpd7LtBp13cOoStKUq6EILcmfLf7FAhRKQVC-QmaAGMUKw5wjmYxrgkhkEkQgk0QLpZ5_oIf8mV5l5TvLimCcx8x2bexHdruLVm5aDYuWQ1jP1yiM28-o5v95xSVj3k5f8KL1-J5fr_ARoDGnlvJBSPagvIgmPHWSKUtbaxoPEiWGckIaFprJzNqZOMpr4UBkJ7V2rIpuj7OHlyqbWg3JvxUf07VwWkkbo7ENvRfOxeHat3vQjd-qoAzRRkXWrNfNlpNsg
ContentType Paper
Journal Article
Copyright 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: 2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
AKY
GOX
DOI 10.48550/arxiv.2008.12014
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
arXiv Computer Science
arXiv.org
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
ExternalDocumentID 2008_12014
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
AKY
GOX
ID FETCH-LOGICAL-a529-f4d745309d28f253afda789d1cd5cf2736a730291b9e761a7cf14b5a227f3b9d3
IEDL.DBID GOX
IngestDate Wed Jul 23 01:58:41 EDT 2025
Mon Jun 30 09:27:49 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a529-f4d745309d28f253afda789d1cd5cf2736a730291b9e761a7cf14b5a227f3b9d3
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
OpenAccessLink https://arxiv.org/abs/2008.12014
PQID 2438134599
PQPubID 2050157
ParticipantIDs arxiv_primary_2008_12014
proquest_journals_2438134599
PublicationCentury 2000
PublicationDate 20200903
PublicationDateYYYYMMDD 2020-09-03
PublicationDate_xml – month: 09
  year: 2020
  text: 20200903
  day: 03
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2020
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.7375976
SecondaryResourceType preprint
Snippet Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing...
Transformer-based language models, such as BERT and its variants, have achieved state-of-the-art performance in several downstream natural language processing...
SourceID arxiv
proquest
SourceType Open Access Repository
Aggregation Database
SubjectTerms Benchmarks
Computer Science - Computation and Language
English language
Language
Natural language
Natural language processing
Performance evaluation
Speech recognition
Transformers
SummonAdditionalLinks – databaseName: ProQuest Technology Collection
  dbid: 8FG
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA7aInjzSatV9uA1tJkkm40XQdm2KIrUCr2VZJOAiG3dreLPN8lu9SB4TU6ZmczzmxmELkJ_tY-TNaYOfIBijcHaCY6VMJnncqYIC43C9w_p-JndzvisSbhVDaxyoxOjojbLIuTI-xBmUVHGpbxaveOwNSpUV5sVGtuoTcDb2tApPhz95FggFd5jpnUxM47u6qvy6-WzhlASb_uY90nj0R9VHO3LcA-1H9XKlvtoyy4O0E6EZRbVIcKjSZ7f4et8Mr1MPEeTAJN5rZLQER7wysmTrdSbTera8hGaDvPpzRg3Cw6w4iCxY0YwTgfSQOaAU-WMEpk0pDC8cN6vSJX_fyCJllakRInCEaa5AhCOamnoMWotlgvbQckg08DBpcoLBpOCa5E67ziwdFD4kAZUF3XiM-ereoZFvX0yUqCLepuXzxv5rea_1D75__oU7UKIQEOJhfZQa11-2DNvptf6PPLiG8Qijnc
  priority: 102
  providerName: ProQuest
Title GREEK-BERT: The Greeks visiting Sesame Street
URI https://www.proquest.com/docview/2438134599
https://arxiv.org/abs/2008.12014
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7aevEiikrVWnLwGuzmsdl4s7JtUVqlVuitJJsNiCjSreLJ3-4ku8WDeMkhTA6TL8nMxzwCcBHqq5EnW8o9Q4JSOketV5Ia5TJEOTOJCIXC01k6eRK3S7lsAdnWwpj11_Nn3R_YVpcx1TFBGyXa0GYskKvx_bIOTsZWXI38rxz6mHHqz9Ma7cVoH_YaR49c18gcQKt8OwQ6nuf5HR3m88UVQYRISHt5qUio8A75x-SxrMxrSepY8REsRvniZkKbDwuokUxTL5wSkg-0Y5lnkhvvjMq0SwonC49-QmrwPjGdWF2qNDGq8Imw0qBinlvt-DF0kPOXXSCDzDLJfGoQaKGVtCr16AiIdFAgRWHmBLpRzdV73ZOi_k0y7sAJ9Laar5rzWK1Y6OTFhdT69P-VZ7DLApsM4RLeg85m_VGeo8nd2D60s9G4DzvDfPYw70cUcJx-5z_53YFh
linkProvider Cornell University
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LS8NAEB7UInrzidWqOehxsdlHNiuIoLZWqyK1Qm9hk90FEdva-PxR_kdnk1YPgrdeEwhMZnZmvp1vZgD2fH814uSUMEcRoFhjSOqkIFqaGLUc65D7RuHrm6h1zy97ojcDX5NeGE-rnPjEwlGbQebvyA-on0XFuFDqePhM_NYoX12drNAozaJtP98RsuVHF2eo331Km43uaYuMtwoQLagijhvJBasrQ2NHBdPOaBkrE2ZGZA6DeaTR6KkKU2UR4muZuZCnQlMqHUuVYfjZWahwxphnEMbN858rHRpJTNBZWTstJoUd6NHHw1vJ2Awx1HJMgYtHfzx_Ec6aS1C51UM7WoYZ21-B-YIFmuWrQM47jUabnDQ63cMADSjwrJzHPPAN6J4eHdzZXD_ZoCxlr0F3GpKvw1x_0LcbENTjlArqIo12yJUUqYwc5ik8qmeIoKiuwkYhZjIsR2aUyy6LP1CF2kTyZHxc8uRXuZv_v96FhVb3-iq5urhpb8Ei9eDXV3dYDeZeRq92GzOEl3Sn0EsAyZTt4BucbMpJ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=GREEK-BERT%3A+The+Greeks+visiting+Sesame+Street&rft.jtitle=arXiv.org&rft.au=Koutsikakis%2C+John&rft.au=Chalkidis%2C+Ilias&rft.au=Malakasiotis%2C+Prodromos&rft.au=Androutsopoulos%2C+Ion&rft.date=2020-09-03&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2008.12014