Deep learning methods for subject text classification of articles

This work presents a method of classification of text documents using deep neural network with LSTM (long short-term memory) units. We have tested different approaches to build feature vectors, which represent documents to be classified: we used feature vectors constructed as sequences of words incl...

Full description

Saved in:

Bibliographic Details
Published in	2017 Federated Conference on Computer Science and Information Systems (FedCSIS) pp. 357 - 360
Main Authors	Semberecki, Piotr, Maciejewski, Henryk
Format	Conference Proceeding
Language	English
Published	Polish Information Processing Society (PIPS) 01.09.2017
Subjects	Encyclopedias Natural language processing Neural networks Training
Online Access	Get full text
ISSN	2300-5963 2300-5963
DOI	10.15439/2017F414

Cover

More Information
Summary:	This work presents a method of classification of text documents using deep neural network with LSTM (long short-term memory) units. We have tested different approaches to build feature vectors, which represent documents to be classified: we used feature vectors constructed as sequences of words included in the documents, or, alternatively, we first converted words into vector representations using word2vec tool and used sequences of these vector representations as features of documents. We evaluated feasibility of this approach for the task of subject classification of documents using a collection of Wikipedia articles representing 7 subject categories. Our experiments show that the approach based on an LSTM network with documents represented as sequences of words coded into word2vec vectors outperformed a standard, bag-of-word approach with documents represented as frequency-of-words feature vectors.
ISSN:	2300-5963 2300-5963
DOI:	10.15439/2017F414