Semi-supervised log anomaly detection based on bidirectional temporal convolution network

System logs record system operation status and important event information. They are the important basis for debugging system failures and cause analysis. Due to the low accuracy of log parsing and insufficient labeled samples, anomaly detection precision is low. Therefore, we propose a new log-base...

Full description

Saved in:
Bibliographic Details
Published inComputers & security Vol. 140; p. 103808
Main Authors Yin, Zhichao, Kong, Xian, Yin, Chunyong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.05.2024
Subjects
Online AccessGet full text
ISSN0167-4048
1872-6208
DOI10.1016/j.cose.2024.103808

Cover

More Information
Summary:System logs record system operation status and important event information. They are the important basis for debugging system failures and cause analysis. Due to the low accuracy of log parsing and insufficient labeled samples, anomaly detection precision is low. Therefore, we propose a new log-based semi-supervised anomaly detection method named BTCNLog. Firstly, the improved log parsing method with the dictionary keeps part of the parameter information in the log event. So, it can improve the utilization rate of log information and the accuracy of log parsing. Then, BERT is used to encode the semantic information to obtain the semantic vector of the log for the template. What's more, the clustering method is applied to estimate the tag to deal with insufficient data tagging problems. Therefore, it can improve the ability to detect unstable data for the model. Finally, a bidirectional temporal convolution network (Bi-TCN) with residual blocks is introduced to capture contextual information from two directions to improve the accuracy and efficiency of anomaly detection. To evaluate the performance of the proposed method, BTCNLog is compared with six baselines on two datasets. The final experimental results show that, compared with the latest three benchmark models, LogBERT, PLELog, and LogEncoder, the proposed method showed an average improvement of 7%, 14.1%, and 8.04% in F1 values.
ISSN:0167-4048
1872-6208
DOI:10.1016/j.cose.2024.103808