A System Fault Diagnosis Method with a Reclustering Algorithm

The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log analysis well and does not require a lot of prior knowledge, but the K-means-based system fault diagnosis method needs to be improved in bot...

Full description

Saved in:
Bibliographic Details
Published inScientific programming Vol. 2021; pp. 1 - 8
Main Authors Yang, Zhe, Ying, Shi, Wang, Bingming, Li, Yiyao, Dong, Bo, Geng, Jiangyi, Zhang, Ting
Format Journal Article
LanguageEnglish
Published New York Hindawi 09.03.2021
John Wiley & Sons, Inc
Subjects
Online AccessGet full text
ISSN1058-9244
1875-919X
1875-919X
DOI10.1155/2021/6617882

Cover

Abstract The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log analysis well and does not require a lot of prior knowledge, but the K-means-based system fault diagnosis method needs to be improved in both efficiency and accuracy. To solve this problem, we propose a system fault diagnosis method based on a reclustering algorithm. First, we propose a log vectorization method based on the PV-DM language model to obtain low-dimensional log vectors which can provide effective data support for the subsequent fault diagnosis; then, we improve the K-means algorithm and make the effect of K-means algorithm based log clustering; finally, we propose a reclustering method based on keywords’ extraction to improve the accuracy of fault diagnosis. We use system log data generated by two supercomputers to verify our method. The experimental results show that compared with the traditional K-means method, our method can improve the accuracy of fault diagnosis while ensuring the efficiency of fault diagnosis.
AbstractList The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log analysis well and does not require a lot of prior knowledge, but the K-means-based system fault diagnosis method needs to be improved in both efficiency and accuracy. To solve this problem, we propose a system fault diagnosis method based on a reclustering algorithm. First, we propose a log vectorization method based on the PV-DM language model to obtain low-dimensional log vectors which can provide effective data support for the subsequent fault diagnosis; then, we improve the K-means algorithm and make the effect of K-means algorithm based log clustering; finally, we propose a reclustering method based on keywords’ extraction to improve the accuracy of fault diagnosis. We use system log data generated by two supercomputers to verify our method. The experimental results show that compared with the traditional K-means method, our method can improve the accuracy of fault diagnosis while ensuring the efficiency of fault diagnosis.
Author Zhang, Ting
Ying, Shi
Wang, Bingming
Li, Yiyao
Yang, Zhe
Geng, Jiangyi
Dong, Bo
Author_xml – sequence: 1
  givenname: Zhe
  surname: Yang
  fullname: Yang, Zhe
  organization: School of Computer ScienceWuhan UniversityWuhanChinawhu.edu.cn
– sequence: 2
  givenname: Shi
  orcidid: 0000-0002-0471-0021
  surname: Ying
  fullname: Ying, Shi
  organization: School of Computer ScienceWuhan UniversityWuhanChinawhu.edu.cn
– sequence: 3
  givenname: Bingming
  orcidid: 0000-0002-8723-0970
  surname: Wang
  fullname: Wang, Bingming
  organization: School of Computer ScienceWuhan UniversityWuhanChinawhu.edu.cn
– sequence: 4
  givenname: Yiyao
  surname: Li
  fullname: Li, Yiyao
  organization: School of Software EngineeringTongji UniversityShanghaiChinatongji.edu.cn
– sequence: 5
  givenname: Bo
  surname: Dong
  fullname: Dong, Bo
  organization: School of Computer ScienceWuhan UniversityWuhanChinawhu.edu.cn
– sequence: 6
  givenname: Jiangyi
  surname: Geng
  fullname: Geng, Jiangyi
  organization: School of Computer ScienceWuhan UniversityWuhanChinawhu.edu.cn
– sequence: 7
  givenname: Ting
  surname: Zhang
  fullname: Zhang, Ting
  organization: School of Computer ScienceWuhan UniversityWuhanChinawhu.edu.cn
BookMark eNqFj0tLAzEUhYNUsK3u_AEBlzqaxySTWbgo1apQEXyAu5Bm0jZlmtQkQ-m_d8p0Jaire7nnu4dzBqDnvDMAnGN0jTFjNwQRfMM5LoQgR6CPRcGyEpefvXZHTGQlyfMTMIhxhRAWGKE-uB3Bt11MZg0nqqkTvLNq4Xy0ET6btPQV3Nq0hAq-Gl03LResW8BRvfChva9PwfFc1dGcHeYQfEzu38eP2fTl4Wk8mmaa0iJls_ksrzA2ylBTVJzkuFACaSUqzQglJuclz0XOGTGC0znVVPNWN6yYlVxhRocg63wbt1G7rapruQl2rcJOYiT33eW-uzx0b_mLjt8E_9WYmOTKN8G1ESVhiFJGcElbinSUDj7GYOZS26SS9S4FZevfrK9-PP2T5LLDl9ZVamv_pr8BDQWDgg
CitedBy_id crossref_primary_10_1111_coin_12646
crossref_primary_10_1360_SST_2022_0194
crossref_primary_10_1109_ACCESS_2021_3128283
crossref_primary_10_1007_s11219_024_09672_6
ContentType Journal Article
Copyright Copyright © 2021 Zhe Yang et al.
Copyright © 2021 Zhe Yang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: Copyright © 2021 Zhe Yang et al.
– notice: Copyright © 2021 Zhe Yang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0
DBID RHU
RHW
RHX
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
DOI 10.1155/2021/6617882
DatabaseName Hindawi Publishing Complete
Hindawi Publishing Subscription Journals
Hindawi Publishing Open Access
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList CrossRef

Technology Research Database
Database_xml – sequence: 1
  dbid: RHX
  name: Hindawi Publishing Open Access
  url: http://www.hindawi.com/journals/
  sourceTypes: Publisher
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1875-919X
Editor Wang, Pengwei
Editor_xml – sequence: 1
  givenname: Pengwei
  surname: Wang
  fullname: Wang, Pengwei
EndPage 8
ExternalDocumentID 10.1155/2021/6617882
10_1155_2021_6617882
GrantInformation_xml – fundername: National Natural Science Foundation of China
  grantid: 62072342; 61672392
GroupedDBID .4S
.DC
0R~
4.4
5VS
AAFWJ
AAJEY
ABDBF
ABJNI
ACGFS
ADBBV
AENEX
ALMA_UNASSIGNED_HOLDINGS
ARCSS
ASPBG
AVWKF
BCNDV
DU5
EAD
EAP
EBS
EDO
EMK
EPL
EST
ESX
GROUPED_DOAJ
HZ~
I-F
IAO
IHR
IOS
KQ8
MIO
MK~
ML~
MV1
NGNOM
O9-
OK1
RHU
RHW
RHX
TUS
24P
AAMMB
AAYXX
ACCMX
AEFGJ
AGXDD
AIDQK
AIDYY
CITATION
H13
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ABEFU
ABUBZ
ACPQW
ADTOC
AFRHK
AGIAB
CAG
COF
EJD
FEDTE
IL9
IPNFZ
MET
RIG
UNPAY
VOH
ID FETCH-LOGICAL-c337t-bfb4d11eae3e7d62417a80ca8dc5232e4696484652e863f3c3c680ce57b96a153
IEDL.DBID RHX
ISSN 1058-9244
1875-919X
IngestDate Sun Oct 26 04:16:59 EDT 2025
Fri Jul 25 09:32:37 EDT 2025
Wed Oct 01 03:30:17 EDT 2025
Thu Apr 24 23:05:21 EDT 2025
Sun Jun 02 19:18:04 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
https://creativecommons.org/licenses/by/4.0
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c337t-bfb4d11eae3e7d62417a80ca8dc5232e4696484652e863f3c3c680ce57b96a153
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-0471-0021
0000-0002-8723-0970
OpenAccessLink https://dx.doi.org/10.1155/2021/6617882
PQID 2503352193
PQPubID 2046410
PageCount 8
ParticipantIDs unpaywall_primary_10_1155_2021_6617882
proquest_journals_2503352193
crossref_citationtrail_10_1155_2021_6617882
crossref_primary_10_1155_2021_6617882
hindawi_primary_10_1155_2021_6617882
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-03-09
PublicationDateYYYYMMDD 2021-03-09
PublicationDate_xml – month: 03
  year: 2021
  text: 2021-03-09
  day: 09
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Scientific programming
PublicationYear 2021
Publisher Hindawi
John Wiley & Sons, Inc
Publisher_xml – name: Hindawi
– name: John Wiley & Sons, Inc
References J. G. Lou (5)
C. Yuan (10)
W. Xu (6)
J. Han (15) 2012
S. J. OlinerA (16)
T. Mikolov (14) 2013
L. Tangl (12)
W. Shang (7)
S. He (11)
Y. Liang (17)
M. Chen (8)
Y. Liang (1)
T. Reidemeister (9)
N. R. Adiga (2)
Q. Fu (4)
Q. Lin (3)
R. Collobert (13)
References_xml – start-page: 231
  ident: 5
  article-title: Mining invariants from console logs for system problem detection
– start-page: 583
  ident: 17
  article-title: Failure prediction in IBM Bluegene/L event logs
– start-page: 375
  ident: 10
  article-title: Automated known problem diagnosis with event traces
– start-page: 575
  ident: 16
  article-title: What supercomputers say: a study of five system logs
– start-page: 377
  ident: 9
  article-title: Mining unstructured log files for recurrent fault diagnosis
– volume-title: Data Mining: Concept and Technology
  year: 2012
  ident: 15
– year: 2013
  ident: 14
  article-title: Efficient estimation of word representations in vector space
– start-page: 149
  ident: 4
  article-title: Execution anomaly detection in distributed systems through unstructured log analysis
– start-page: 36
  ident: 8
  article-title: Failure diagnosis using decision trees
– start-page: 785
  ident: 12
  article-title: LogSig: generatingsystemevents from raw textual logs
– start-page: 117
  ident: 6
  article-title: Detecting large-scale system problems by mining console logs
– start-page: 402
  ident: 7
  article-title: Assisting developers of big data analytics applications when deploying on hadoop clouds
– start-page: 160
  ident: 13
  article-title: A unified architecture for natural language processing: deep neural networks with multitask learning
– start-page: 476
  ident: 1
  article-title: Filtering failure logs for a bluegene/L prototype
– start-page: 60
  ident: 2
  article-title: An overview of the BlueGene/L supercomputer
– start-page: 207
  ident: 11
  article-title: Experience report: system log analysis for anomaly detection
– start-page: 102
  ident: 3
  article-title: Log clustering based problem identification for online service systems
SSID ssj0018100
Score 2.293915
Snippet The log analysis-based system fault diagnosis method can help engineers analyze the fault events generated by the system. The K-means algorithm can perform log...
SourceID unpaywall
proquest
crossref
hindawi
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Accuracy
Algorithms
Clustering
Failure
Fault diagnosis
Machine learning
Neural networks
Semantics
Software
Supercomputers
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fT9swED6NIgQv69gP0a1DfgBeUEoTx07ysIdqW4WQQDxQqRMP0dlxGCK0FU2Etr9-58RBdBIb4jHJyUnubN_32ec7gD3pKzRZlHjDQCkvlER3UIbCQ_R9I8gDynrB7fRMHk_Ck6mYuqhKexYmsyni55gtBz8tJ72_rmdrp9fl0XJh2bp_JO3Jtpg4YZavwboUhMM7sD45Ox_9qLc3BY3ioK7k6hMgpxGdTNuodyFWmljxRxvupStwc7OaLfDXPRbFI88z7sJl-81NwMnNoCrVQP_-K53jy37qDbx2gJSNmh60Da_M7C1022IPzI39d_BlxJrs5myMVVGyb02M3vWSndZFqJld0WXICIcWlc2-QD6RjYqr-R3dv30Pk_H3i6_Hnqu94GnOo9JTuQozMhcabqJMkp-PMB5qjDNN1DUwxKplSNhFBCaWPOeaa0nPjYhUIpGm0Q_Qmc1nZgeY3QrlNFUIjVkYqRxxqHI-1EFsF2FR9eCwtUCqXWJyWx-jSGuCIkRqtZM67fRg_0F60STkeEJuzyn7P2L91tJpa5A0sHu7hGsS3oODB-v_s52PzxX8BFv2sg5jS_rQKe8q85lwTal2XQf-A3ji72E
  priority: 102
  providerName: Unpaywall
Title A System Fault Diagnosis Method with a Reclustering Algorithm
URI https://dx.doi.org/10.1155/2021/6617882
https://www.proquest.com/docview/2503352193
https://downloads.hindawi.com/journals/sp/2021/6617882.pdf
UnpaywallVersion publishedVersion
Volume 2021
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1875-919X
  dateEnd: 20240530
  omitProxy: true
  ssIdentifier: ssj0018100
  issn: 1875-919X
  databaseCode: KQ8
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVWIB
  databaseName: Wiley Online Library Open Access
  customDbUrl:
  eissn: 1875-919X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0018100
  issn: 1875-919X
  databaseCode: 24P
  dateStart: 19920101
  isFulltext: true
  titleUrlDefault: https://authorservices.wiley.com/open-science/open-access/browse-journals.html
  providerName: Wiley-Blackwell
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT4NAEN5ojdGLb2O1NnuoXgwRWHaBg4dGbRqTNo2xST2RXVi0CdKmhTT-e2dhaazGxxEYIJll9ptvZphBqMUswWXk-oZpC2E4DOgOZw41OLcsSQEBWRFw6_VZd-g8jOhIN0maf0_hA9opem5dM_Urmwd77brHVOXWY3e0TBZ4llk2HaBguwBXVX37l3tXkGfzVVHexXjFsdzK0yl_X_Ak-YQxnT20o51D3C5Xcx-tyfQA7VaDF7C2w0N008Zlp3Hc4XmS4buyXm48x71iIDRW0VXMMfiESa46IQA-4XbyMpnB-bcjNOzcP912DT0HwQgJcTNDxMKJQHVcEulGDDDX5Z4Zci8KgUbaEhguc8CPoLb0GIlJSEIG1yV1hc84bGnHqJZOUnmCsEpLEjBbGvLIcUXMuSliYoa2pwKiXNTRVaWjINRNwtWsiiQoyAKlgdJooDVaRxdL6WnZHOMHuZZW9x9ijWotAm1J88BWeVbwMXxSR5fL9fn1Oaf_e90Z2laHRUGZ30C1bJbLc_AwMtFE67YzaBZfWRNtDPuD9vMHVdDGWw
linkProvider Hindawi Publishing
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fT9swED6NIgQv69gP0a1DfgBeUEoTx07ysIdqW4WQQDxQqRMP0dlxGCK0FU2Etr9-58RBdBIb4jHJyUnubN_32ec7gD3pKzRZlHjDQCkvlER3UIbCQ_R9I8gDynrB7fRMHk_Ck6mYuqhKexYmsyni55gtBz8tJ72_rmdrp9fl0XJh2bp_JO3Jtpg4YZavwboUhMM7sD45Ox_9qLc3BY3ioK7k6hMgpxGdTNuodyFWmljxRxvupStwc7OaLfDXPRbFI88z7sJl-81NwMnNoCrVQP_-K53jy37qDbx2gJSNmh60Da_M7C1022IPzI39d_BlxJrs5myMVVGyb02M3vWSndZFqJld0WXICIcWlc2-QD6RjYqr-R3dv30Pk_H3i6_Hnqu94GnOo9JTuQozMhcabqJMkp-PMB5qjDNN1DUwxKplSNhFBCaWPOeaa0nPjYhUIpGm0Q_Qmc1nZgeY3QrlNFUIjVkYqRxxqHI-1EFsF2FR9eCwtUCqXWJyWx-jSGuCIkRqtZM67fRg_0F60STkeEJuzyn7P2L91tJpa5A0sHu7hGsS3oODB-v_s52PzxX8BFv2sg5jS_rQKe8q85lwTal2XQf-A3ji72E
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+System+Fault+Diagnosis+Method+with+a+Reclustering+Algorithm&rft.jtitle=Scientific+programming&rft.au=Yang%2C+Zhe&rft.au=Ying%2C+Shi&rft.au=Wang%2C+Bingming&rft.au=Li%2C+Yiyao&rft.date=2021-03-09&rft.issn=1058-9244&rft.eissn=1875-919X&rft.volume=2021&rft.spage=1&rft.epage=8&rft_id=info:doi/10.1155%2F2021%2F6617882&rft.externalDBID=n%2Fa&rft.externalDocID=10_1155_2021_6617882
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-9244&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-9244&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-9244&client=summon