Deep learning methods for subject text classification of articles

This work presents a method of classification of text documents using deep neural network with LSTM (long short-term memory) units. We have tested different approaches to build feature vectors, which represent documents to be classified: we used feature vectors constructed as sequences of words incl...

Full description

Saved in:
Bibliographic Details
Published in2017 Federated Conference on Computer Science and Information Systems (FedCSIS) pp. 357 - 360
Main Authors Semberecki, Piotr, Maciejewski, Henryk
Format Conference Proceeding
LanguageEnglish
Published Polish Information Processing Society (PIPS) 01.09.2017
Subjects
Online AccessGet full text
ISSN2300-5963
2300-5963
DOI10.15439/2017F414

Cover

More Information
Summary:This work presents a method of classification of text documents using deep neural network with LSTM (long short-term memory) units. We have tested different approaches to build feature vectors, which represent documents to be classified: we used feature vectors constructed as sequences of words included in the documents, or, alternatively, we first converted words into vector representations using word2vec tool and used sequences of these vector representations as features of documents. We evaluated feasibility of this approach for the task of subject classification of documents using a collection of Wikipedia articles representing 7 subject categories. Our experiments show that the approach based on an LSTM network with documents represented as sequences of words coded into word2vec vectors outperformed a standard, bag-of-word approach with documents represented as frequency-of-words feature vectors.
ISSN:2300-5963
2300-5963
DOI:10.15439/2017F414