Malware classification with recurrent networks

Attackers often create systems that automatically rewrite and reorder their malware to avoid detection. Typical machine learning approaches, which learn a classifier based on a handcrafted feature vector, are not sufficiently robust to such reorderings. We propose a different approach, which, simila...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1916 - 1920
Main Authors Pascanu, Razvan, Stokes, Jack W., Sanossian, Hermineh, Marinescu, Mady, Thomas, Anil
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2015
Subjects
Online AccessGet full text
ISSN1520-6149
DOI10.1109/ICASSP.2015.7178304

Cover

More Information
Summary:Attackers often create systems that automatically rewrite and reorder their malware to avoid detection. Typical machine learning approaches, which learn a classifier based on a handcrafted feature vector, are not sufficiently robust to such reorderings. We propose a different approach, which, similar to natural language modeling, learns the language of malware spoken through the executed instructions and extracts robust, time domain features. Echo state networks (ESNs) and recurrent neural networks (RNNs) are used for the projection stage that extracts the features. These models are trained in an unsupervised fashion. A standard classifier uses these features to detect malicious files. We explore a few variants of ESNs and RNNs for the projection stage, including Max-Pooling and Half-Frame models which we propose. The best performing hybrid model uses an ESN for the recurrent model, Max-Pooling for non-linear sampling, and logistic regression for the final classification. Compared to the standard trigram of events model, it improves the true positive rate by 98.3% at a false positive rate of 0.1%.
ISSN:1520-6149
DOI:10.1109/ICASSP.2015.7178304