Improving Skip-Gram Embeddings Using BERT

Contextualized embeddings such as BERT and GPT have been shown to give significant improvement in NLP tasks. On the other hand, static embeddings such as skip-gram and GloVe still have desirable characteristics such as low computational cost, easy deployment and freedom from severe contextualized va...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 29; pp. 1318 - 1328
Main Authors	Wang, Yile, Cui, Leyang, Zhang, Yue
Format	Journal Article
Language	English
Published	Piscataway IEEE 2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	BERT Bit error rate Context Context modeling contextualized embeddings Distillation Predictive models Representations Semantics skip-gram Syntactics Task analysis Training word embedding
Online Access	Get full text
ISSN	2329-9290 2329-9304
DOI	10.1109/TASLP.2021.3065201

Cover

More Information
Summary:	Contextualized embeddings such as BERT and GPT have been shown to give significant improvement in NLP tasks. On the other hand, static embeddings such as skip-gram and GloVe still have desirable characteristics such as low computational cost, easy deployment and freedom from severe contextualized variation in representation. There has been some recent attempt enhancing the skip-gram model by adding syntactic information of context using GCN. We investigate the use of BERT embeddings instead for stronger context representation, which contains not only syntactic and surface features, but also rich knowledge from large-scale pre-training. Results show that BERT-enhanced skip-gram embeddings outperform GCN-enhanced embeddings on a range of tasks. Such embeddings also outperform recent effort distilling BERT embeddings into context-independent vectors.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2021.3065201