EnCoD: Distinguishing Compressed and Encrypted File Fragments

Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g....

Full description

Saved in:
Bibliographic Details
Published inNetwork and System Security Vol. 12570; pp. 42 - 62
Main Authors De Gaspari, Fabio, Hitaj, Dorjan, Pagnotta, Giulio, De Carli, Lorenzo, Mancini, Luigi V.
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2020
Springer International Publishing
SeriesLecture Notes in Computer Science
Online AccessGet full text
ISBN9783030657444
3030657442
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-65745-1_3

Cover

More Information
Summary:Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, selected data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
ISBN:9783030657444
3030657442
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-65745-1_3