LADDERS: Log Based Anomaly Detection and Diagnosis for Enterprise Systems

Enterprise software can fail due to not only malfunction of application servers, but also due to performance degradation or non-availability of other servers or middle layers. Consequently, valuable time and resources are wasted in trying to identify the root cause of software failures. To address t...

Full description

Saved in:
Bibliographic Details
Published inAnnals of data science Vol. 11; no. 4; pp. 1165 - 1183
Main Authors Mondal, Sakib A., Rv, Prashanth, Rao, Sagar, Menon, Arun
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.08.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2198-5804
2198-5812
DOI10.1007/s40745-023-00471-7

Cover

More Information
Summary:Enterprise software can fail due to not only malfunction of application servers, but also due to performance degradation or non-availability of other servers or middle layers. Consequently, valuable time and resources are wasted in trying to identify the root cause of software failures. To address this, we have developed a framework called LADDERS. In LADDERS, anomalous incidents are detected from log events generated by various systems and KPIs (Key Performance Indicators) through an ensemble of supervised and unsupervised models. Without transaction identifiers, it is not possible to relate various events from different systems. LADDERS implements Recursive Parallel Causal Discovery (RPCD) to establish causal relationships among log events. The framework builds coresets using BICO to manage high volumes of log data during training and inferencing. An anomaly can cause a number of anomalies throughout the systems. LADDERS makes use of RPCD again to discover causal relationships among these anomalous events. Probable root causes are revealed from the causal graph and anomaly rating of events using a k-shortest path algorithm. We evaluated LADDERS using live logs from an enterprise system. The results demonstrate its effectiveness and efficiency for anomaly detection.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2198-5804
2198-5812
DOI:10.1007/s40745-023-00471-7