PSDVec: a Toolbox for Incremental and Scalable Word Embedding
PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite...
        Saved in:
      
    
          | Published in | arXiv.org | 
|---|---|
| Main Authors | , , | 
| Format | Paper Journal Article | 
| Language | English | 
| Published | 
        Ithaca
          Cornell University Library, arXiv.org
    
        10.06.2016
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2331-8422 | 
| DOI | 10.48550/arxiv.1606.03192 | 
Cover
| Summary: | PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners. | 
|---|---|
| Bibliography: | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50  | 
| ISSN: | 2331-8422 | 
| DOI: | 10.48550/arxiv.1606.03192 |