Improving Skip-Gram Embeddings Using BERT

Contextualized embeddings such as BERT and GPT have been shown to give significant improvement in NLP tasks. On the other hand, static embeddings such as skip-gram and GloVe still have desirable characteristics such as low computational cost, easy deployment and freedom from severe contextualized va...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on audio, speech, and language processing Vol. 29; pp. 1318 - 1328
Main Authors Wang, Yile, Cui, Leyang, Zhang, Yue
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2329-9290
2329-9304
DOI10.1109/TASLP.2021.3065201

Cover

More Information
Summary:Contextualized embeddings such as BERT and GPT have been shown to give significant improvement in NLP tasks. On the other hand, static embeddings such as skip-gram and GloVe still have desirable characteristics such as low computational cost, easy deployment and freedom from severe contextualized variation in representation. There has been some recent attempt enhancing the skip-gram model by adding syntactic information of context using GCN. We investigate the use of BERT embeddings instead for stronger context representation, which contains not only syntactic and surface features, but also rich knowledge from large-scale pre-training. Results show that BERT-enhanced skip-gram embeddings outperform GCN-enhanced embeddings on a range of tasks. Such embeddings also outperform recent effort distilling BERT embeddings into context-independent vectors.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2329-9290
2329-9304
DOI:10.1109/TASLP.2021.3065201