DefSent+: Improving sentence embeddings of language models by projecting definition sentences into a quasi-isotropic or isotropic vector space of unlimited dictionary entries
This paper presents a significant improvement on the previous conference paper known as DefSent. The prior study seeks to improve sentence embeddings of language models by projecting definition sentences into the vector space of dictionary entries. We discover that this approach is not fully explore...
Saved in:
Main Author | |
---|---|
Format | Journal Article |
Language | English |
Published |
25.05.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2405.16153 |
Cover
Summary: | This paper presents a significant improvement on the previous conference
paper known as DefSent. The prior study seeks to improve sentence embeddings of
language models by projecting definition sentences into the vector space of
dictionary entries. We discover that this approach is not fully explored due to
the methodological limitation of using word embeddings of language models to
represent dictionary entries. This leads to two hindrances. First, dictionary
entries are constrained by the single-word vocabulary, and thus cannot be fully
exploited. Second, semantic representations of language models are known to be
anisotropic, but pre-processing word embeddings for DefSent is not allowed
because its weight is frozen during training and tied to the prediction layer.
In this paper, we propose a novel method to progressively build entry
embeddings not subject to the limitations. As a result, definition sentences
can be projected into a quasi-isotropic or isotropic vector space of unlimited
dictionary entries, so that sentence embeddings of noticeably better quality
are attainable. We abbreviate our approach as DefSent+ (a plus version of
DefSent), involving the following strengths: 1) the task performance on
measuring sentence similarities is significantly improved compared to DefSent;
2) when DefSent+ is used to further train data-augmented models like SIMCSE,
SNCSE, and SynCSE, state-of-the-art performance on measuring sentence
similarities can be achieved among the approaches without using manually
labeled datasets; 3) DefSent+ is also competitive in feature-based transfer for
NLP downstream tasks. |
---|---|
DOI: | 10.48550/arxiv.2405.16153 |