Fault Monitoring with Sequential Matrix Factorization

For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be defined a priori. Accordingly, this paper considers fault monitoring as a latent factors discovery problem. In the context of end-to-end probing,...

Full description

Saved in:
Bibliographic Details
Published inACM transactions on autonomous and adaptive systems Vol. 10; no. 3; pp. 1 - 25
Main Authors Feng, Dawei, Germain, Cecile
Format Journal Article
LanguageEnglish
Published Association for Computing Machinery (ACM) 01.10.2015
Subjects
Online AccessGet full text
ISSN1556-4665
1556-4703
1556-4703
DOI10.1145/2797141

Cover

Abstract For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be defined a priori. Accordingly, this paper considers fault monitoring as a latent factors discovery problem. In the context of end-to-end probing, the goal is to devise an efficient sampling policy that makes the best use of a constrained sampling budget. Previous work addresses fault monitoring in a collaborative prediction framework, where the information is a snapshot of the probes outcomes. Here, we take into account the fact that the system dynamically evolves at various time scales. We propose and evaluate Sequential Matrix Factorization (SMF) that exploits both the recent advances in matrix factorization for the instantaneous information and a new sampling heuristics based on historical information. The effectiveness of the SMF approach is exemplified on datasets of increasing difficulty and compared with state of the art history-based or snapshot-based methods. In all cases, strong adaptivity under the specific flavor of active learning is required to unleash the full potential of coupling the most confident and the most uncertain sampling heuristics, which is the cornerstone of SMF.
AbstractList For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be defined a priori. Accordingly, this paper considers fault monitoring as a latent factors discovery problem. In the context of end-to-end probing, the goal is to devise an efficient sampling policy that makes the best use of a constrained sampling budget. Previous work addresses fault monitoring in a collaborative prediction framework, where the information is a snapshot of the probes outcomes. Here, we take into account the fact that the system dynamically evolves at various time scales. We propose and evaluate Sequential Matrix Factorization (SMF) that exploits both the recent advances in matrix factorization for the instantaneous information and a new sampling heuristics based on historical information. The effectiveness of the SMF approach is exemplified on datasets of increasing difficulty and compared with state of the art history-based or snapshot-based methods. In all cases, strong adaptivity under the specific flavor of active learning is required to unleash the full potential of coupling the most confident and the most uncertain sampling heuristics, which is the cornerstone of SMF.
For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be defined a priori. Accordingly, this paper considers fault monitoring as a latent factors discovery problem. In the context of end-to-end probing, the goal is to devise an efficient sampling policy that makes the best use of a constrained sampling budget. Previous work addresses fault monitoring in a collaborative prediction framework, where the information is a snapshot of the probes outcomes. Here, we take into account the fact that the system dynamically evolves at various time scales. We propose and evaluate Sequential Matrix Factorization (SMF) that exploits both the recent advances in matrix factorization for the instantaneous information and a new sampling heuristics based on historical information. The effectiveness of the SMF approach is exemplified on datasets of increasing difficulty and compared with state of the art history-based or snapshot-based methods. In all cases, strong adaptivity under the specific flavor of active learning is required to unleash the full potential of coupling the most confident and the most uncertain sampling heuristics, which is the cornerstone of SMF.
For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be defined a priori. Accordingly, this paper considers fault monitoring as a latent factors discovery problem. In the context of end-to-end probing, the goal is to devise an efficient sampling policy that makes the best use of a constrained sampling budget. Previous work addresses fault monitoring in a Collaborative Prediction framework, where the information is a snapshot of the probes outcomes. Here, we take into account the fact that the system dynamically evolves at various time scales. We propose and evaluate Sequential Matrix Factor-ization (SMF) that exploits both the recent advances in matrix factoriza-tion for the instantaneous information and a new sampling heuristics based on historical information. The effectiveness of the SMF approach is exemplified on datasets of increasing difficulty and compared with state of the art history-based or snapshot-based methods. In all cases, strong adaptivity under the specific flavor of active learning is required to unleash the full potential of coupling the most confident and the most uncertain sampling heuristics, which is the cornerstone of SMF.
Author Germain, Cecile
Feng, Dawei
Author_xml – sequence: 1
  givenname: Dawei
  surname: Feng
  fullname: Feng, Dawei
  organization: National University of Defense Technology, Université Paris Sud, INRIA and CNRS, Changsha, China
– sequence: 2
  givenname: Cecile
  surname: Germain
  fullname: Germain, Cecile
  organization: Université Paris Sud, INRIA and CNRS, Orsay Cedex
BackLink https://inria.hal.science/hal-01176013$$DView record in HAL
BookMark eNp90E1LAzEQBuAgFWyr-Bf2ph5Wk81X91iKtUKLB_Uc0mzWRtKkJllr_fVubaWg4GmG4WGYd3qg47zTAJwjeI0QoTcFLzki6Ah0EaUsJxzizk_PGD0BvRhfIaQIYtQFdCwbm7KZdyb5YNxLtjZpkT3qt0a7ZKTNZjIF85GNpdqCT5mMd6fguJY26rN97YPn8e3TaJJPH-7uR8NprnBBU67YvCKV5lypGtJ6oKBWNVaYwVKWczrHBWIVKTGpCCtIoeqa4kqqgmGuOUQV7oPL3d7GreRmLa0Vq2CWMmwEgmIbV-zjtvRqRxfygLw0YjKciu0MIsQZRPgdHdaugm9zxiSWJiptrXTaN1Eg3j6NEcgGLb3YURV8jEHX_xyQ_5LKpO9vpSCN_eO_AJrEf9E
CitedBy_id crossref_primary_10_1145_3469440
Cites_doi 10.1145/1102351.1102441
10.1088/1742-6596/119/6/062012
10.1088/1742-6596/219/6/062029
10.1145/1102351.1102399
10.1145/564585.564601
10.1109/TKDE.2013.146
10.5555/2283696.2283780
10.1109/INM.2007.374794
10.1145/1132952.1132955
10.1007/s10208-009-9045-5
10.1109/NSSMIC.2003.1352187
10.12921/cmst.2006.12.01.33-45
10.5555/560889.792357
10.1109/TNN.2005.853423
10.1016/j.future.2013.06.001
10.1016/S0169-7439(97)00032-4
10.1145/2348832.2348837
10.1093/imaiai/iau006
10.1088/1742-6596/119/6/062036
10.1007/s10994-013-5369-0
10.1109/TKDE.2008.239
10.1002/cpe.1915
10.1016/j.chemolab.2010.08.004
10.1109/ICDM.2012.106
10.1016/j.cam.2006.05.008
10.5555/1953048.2185803
10.5555/647883.738238
10.1137/07070111X
10.1109/TIT.2007.901152
10.1109/TIT.2010.2044061
ContentType Journal Article
Copyright Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
VOOES
ADTOC
UNPAY
DOI 10.1145/2797141
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts
CrossRef

Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1556-4703
EndPage 25
ExternalDocumentID oai:HAL:hal-01176013v1
10_1145_2797141
GroupedDBID .4S
.DC
23M
4.4
5GY
5VS
8US
AAKMM
AALFJ
AAYFX
AAYXX
ABPPZ
ACM
ADBCU
ADL
ADMLS
AEBYY
AEFXT
AEJOY
AENEX
AENSD
AFWIH
AFWXC
AIKLT
AKRVB
ALMA_UNASSIGNED_HOLDINGS
ARCSS
ASPBG
AVWKF
BDXCO
CCLIF
CITATION
CS3
EBS
EDO
EJD
GUFHI
HGAVV
H~9
I07
LHSKQ
P1C
P2P
RNS
ROL
TUS
ZCA
7SC
8FD
JQ2
L7M
L~C
L~D
1XC
AFFNX
FEDTE
VOOES
XOL
ADTOC
UNPAY
ID FETCH-LOGICAL-c325t-c6bd4de77ccf05f8c0ecf3c3609a9b5b3216d4934d46242cff53dac2637e701d3
IEDL.DBID UNPAY
ISSN 1556-4665
1556-4703
IngestDate Sun Oct 26 04:06:03 EDT 2025
Tue Oct 14 20:05:13 EDT 2025
Thu Oct 02 06:28:19 EDT 2025
Thu Apr 24 23:00:55 EDT 2025
Wed Oct 01 05:47:02 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords Machine Learning Additional Key Words and Phrases: Fault Inference
Categories and Subject Descriptors: [Computer systems organization]: Dependable and fault-tolerant systems and networks—Reliability
Matrix Factorization
[Com-puter systems organization]: Dependable and fault-tolerant systems and networks—Availability General Terms: Grids and Clouds
Active Learning ACM Reference Format
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
other-oa
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c325t-c6bd4de77ccf05f8c0ecf3c3609a9b5b3216d4934d46242cff53dac2637e701d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://proxy.k.utb.cz/login?url=https://inria.hal.science/hal-01176013
PQID 1770364068
PQPubID 23500
PageCount 25
ParticipantIDs unpaywall_primary_10_1145_2797141
hal_primary_oai_HAL_hal_01176013v1
proquest_miscellaneous_1770364068
crossref_primary_10_1145_2797141
crossref_citationtrail_10_1145_2797141
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2015-10-01
PublicationDateYYYYMMDD 2015-10-01
PublicationDate_xml – month: 10
  year: 2015
  text: 2015-10-01
  day: 01
PublicationDecade 2010
PublicationTitle ACM transactions on autonomous and adaptive systems
PublicationYear 2015
Publisher Association for Computing Machinery (ACM)
Publisher_xml – name: Association for Computing Machinery (ACM)
References Tokic Michel (e_1_2_1_40_1) 2010
Killian Charles (e_1_2_1_20_1) 2007
e_1_2_1_42_1
Liu Xuezheng (e_1_2_1_26_1) 2008; 8
e_1_2_1_41_1
Barham Paul (e_1_2_1_4_1) 2004; 4
e_1_2_1_23_1
e_1_2_1_24_1
e_1_2_1_45_1
e_1_2_1_21_1
Reynolds Patrick (e_1_2_1_35_1) 2006; 6
e_1_2_1_44_1
e_1_2_1_22_1
Srebro Nathan (e_1_2_1_39_1) 2005; 17
e_1_2_1_43_1
e_1_2_1_27_1
e_1_2_1_28_1
Borchers Brian (e_1_2_1_5_1) 1999; 11
Geels Dennis (e_1_2_1_16_1) 2007; 7
McGill Robert (e_1_2_1_29_1) 1978; 32
e_1_2_1_7_1
e_1_2_1_31_1
e_1_2_1_8_1
e_1_2_1_30_1
e_1_2_1_6_1
e_1_2_1_3_1
e_1_2_1_13_1
Liu Ji (e_1_2_1_25_1) 2009
e_1_2_1_34_1
e_1_2_1_1_1
e_1_2_1_10_1
e_1_2_1_33_1
e_1_2_1_11_1
e_1_2_1_32_1
e_1_2_1_17_1
e_1_2_1_38_1
e_1_2_1_37_1
e_1_2_1_15_1
e_1_2_1_36_1
e_1_2_1_9_1
e_1_2_1_18_1
Fonseca Rodrigo (e_1_2_1_14_1) 2007
e_1_2_1_19_1
References_xml – ident: e_1_2_1_34_1
  doi: 10.1145/1102351.1102441
– ident: e_1_2_1_3_1
  doi: 10.1088/1742-6596/119/6/062012
– ident: e_1_2_1_42_1
  doi: 10.1088/1742-6596/219/6/062029
– volume: 4
  start-page: 18
  year: 2004
  ident: e_1_2_1_4_1
  article-title: Using magpie for request extraction and workload modelling
  publication-title: OSDI
– ident: e_1_2_1_19_1
  doi: 10.1145/1102351.1102399
– ident: e_1_2_1_27_1
– volume-title: Proceedings of the 4th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 20
  year: 2007
  ident: e_1_2_1_14_1
– ident: e_1_2_1_17_1
  doi: 10.1145/564585.564601
– volume: 17
  start-page: 1329
  year: 2005
  ident: e_1_2_1_39_1
  article-title: Maximum-margin matrix factorization
  publication-title: Advances in Neural Information Processing Systems
– volume: 8
  start-page: 423
  year: 2008
  ident: e_1_2_1_26_1
  article-title: D3S: Debugging deployed distributed systems
  publication-title: NSDI
– ident: e_1_2_1_45_1
  doi: 10.1109/TKDE.2013.146
– volume-title: Proceedings of the 2009 IEEE 12th International Conference on Computer Vision. IEEE, 2114--2121
  year: 2009
  ident: e_1_2_1_25_1
– ident: e_1_2_1_24_1
  doi: 10.5555/2283696.2283780
– ident: e_1_2_1_37_1
  doi: 10.1109/INM.2007.374794
– ident: e_1_2_1_43_1
  doi: 10.1145/1132952.1132955
– ident: e_1_2_1_7_1
  doi: 10.1007/s10208-009-9045-5
– volume: 7
  start-page: 285
  year: 2007
  ident: e_1_2_1_16_1
  article-title: Friday: Global comprehension for distributed replay
  publication-title: NSDI
– volume: 6
  start-page: 115
  year: 2006
  ident: e_1_2_1_35_1
  article-title: Pip: Detecting the unexpected in distributed systems
  publication-title: NSDI
– ident: e_1_2_1_30_1
  doi: 10.1109/NSSMIC.2003.1352187
– volume: 32
  start-page: 1
  year: 1978
  ident: e_1_2_1_29_1
  article-title: Variations of box plots
  publication-title: The American Statistician
– volume: 11
  start-page: 1
  year: 1999
  ident: e_1_2_1_5_1
  article-title: CSDP, AC library for semidefinite programming
  publication-title: Optimization Methods and Software
– ident: e_1_2_1_23_1
  doi: 10.12921/cmst.2006.12.01.33-45
– ident: e_1_2_1_15_1
  doi: 10.5555/560889.792357
– ident: e_1_2_1_36_1
  doi: 10.1109/TNN.2005.853423
– ident: e_1_2_1_13_1
  doi: 10.1016/j.future.2013.06.001
– ident: e_1_2_1_6_1
  doi: 10.1016/S0169-7439(97)00032-4
– volume-title: death, and the critical transition: Finding liveness bugs in systems code. NSDI 07: Networked Systems Design and Implementation
  year: 2007
  ident: e_1_2_1_20_1
– ident: e_1_2_1_32_1
  doi: 10.1145/2348832.2348837
– ident: e_1_2_1_10_1
  doi: 10.1093/imaiai/iau006
– ident: e_1_2_1_22_1
– ident: e_1_2_1_28_1
  doi: 10.1088/1742-6596/119/6/062036
– volume-title: Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence (LNCS 6359)
  year: 2010
  ident: e_1_2_1_40_1
– ident: e_1_2_1_44_1
  doi: 10.1007/s10994-013-5369-0
– ident: e_1_2_1_18_1
  doi: 10.1109/TKDE.2008.239
– ident: e_1_2_1_41_1
  doi: 10.1002/cpe.1915
– ident: e_1_2_1_1_1
  doi: 10.1016/j.chemolab.2010.08.004
– ident: e_1_2_1_31_1
  doi: 10.1109/ICDM.2012.106
– ident: e_1_2_1_11_1
  doi: 10.1016/j.cam.2006.05.008
– ident: e_1_2_1_33_1
  doi: 10.5555/1953048.2185803
– ident: e_1_2_1_9_1
  doi: 10.5555/647883.738238
– ident: e_1_2_1_21_1
  doi: 10.1137/07070111X
– ident: e_1_2_1_38_1
  doi: 10.1109/TIT.2007.901152
– ident: e_1_2_1_8_1
  doi: 10.1109/TIT.2010.2044061
SSID ssj0051031
Score 2.0255964
Snippet For real-world distributed systems, the knowledge component at the core of the MAPE-K loop has to be inferred, as it cannot be realistically assumed to be...
SourceID unpaywall
hal
proquest
crossref
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
StartPage 1
SubjectTerms Autonomous
Computer Science
Distributed, Parallel, and Cluster Computing
Dynamical systems
Factorization
Faults
Heuristic
Machine Learning
Monitoring
Policies
Sampling
Title Fault Monitoring with Sequential Matrix Factorization
URI https://www.proquest.com/docview/1770364068
https://inria.hal.science/hal-01176013
UnpaywallVersion submittedVersion
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1556-4703
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0051031
  issn: 1556-4665
  databaseCode: ADMLS
  dateStart: 20070601
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB7R3QNcKE9RKJVBiJu3SfzI5rgClhVqK6SyUjlF9sRRJaK0oklp-fUdx86qgBDcrGQkW_OQP9sz3wC8Sf1zk60zjs44LguLfG7RcmlQOYu6tiHb4kiv1vLTiTqJCbJDLUxLep-dEu6MO8A-jbknLqOTg9iCqVaEuScwXR99XnwdyFCV5lIPPSPDmHw4VMcS1Ff7WV7kqUx_2Xa2Tn3S4y1Eebdvz831D9M0tzaX5TZ8HJcVckq-zfrOzvDnb4yN_173A7gf8SVbBId4CHdc-wi2x94NLIbyY1BL0zcdCyHt7_aYv5Flx0NmNUV9ww49ef8VWw4NeWK15hNYLz98ebfisYUCR5GpjqO2laxcniPWiarnmDisBQqdFKawyoos1ZUshKykLxTBulaiMphpkbs8SSvxFCbtWeueAatypwzi3Mc8nel0oSWZt6DzYSKNTdMdeDtquMTIL-7bXDRlqH1WZTTFDrCN4Hmg1PhT5DWpb_PXU2CvFgel_zaq9JKEXo0WLCkm_EOHad1Zf1GmuacVI6gyJ5mNaf822fP_kHkB9wgpqZDFtwuT7nvvXhIa6eweTBfvDw-O96JL3gCmoNyS
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS-QwFA46Puy-eNkL3layi_iWsW0unT4OsuMgriysA-5TSU5TBEsVbb39ek-adFCXRd9CeyDhXMiX5JzvELIbu-cmUyYMrLZMZAbYyIBhQoO0BlRpfLbFiZrOxNGZPAsJsl0tTI16H54j7gw7wD6OmSMuw5MDXyRLSiLmHpCl2cnv8d-ODFUqJlTXM9KP0Yd9dSxCfbmfpFkai_jFtrN47pIenyHKD219pR_udFU921wmK-SwX5bPKbkYto0ZwuMrxsa3171KlgO-pGPvEGtkwdafyErfu4GGUP5M5ES3VUN9SLu7PepuZOmfLrMao76ivxx5_z2ddA15QrXmFzKb_Dw9mLLQQoEBT2TDQJlCFDZNAcpIliOILJQcuIoynRlpeBKrQmRcFMIVikBZSl5oSBRPbRrFBf9KBvVlbdcJLVIrNcDIxTye6VSmBJo3w_NhJLSJ4w2y12s4h8Av7tpcVLmvfZZ5MMUGoXPBK0-p8a_ID1Tf_K-jwJ6Oj3P3rVfpLQp97y2YY0y4hw5d28v2Jo9TRyuGUGWEMnPT_m-yzXfIbJGPiJSkz-LbJoPmurXfEI00Zie44hPJ8tr-
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fault+monitoring+with+sequential+matrix+factorization&rft.jtitle=ACM+transactions+on+autonomous+and+adaptive+systems&rft.au=Feng%2C+Dawei&rft.au=Germain%2C+Cecile&rft.date=2015-10-01&rft.pub=Association+for+Computing+Machinery+%28ACM%29&rft.issn=1556-4665&rft.eissn=1556-4703&rft.volume=10&rft.issue=3&rft.spage=20%3A1&rft.epage=20%3A25&rft_id=info:doi/10.1145%2F2797141&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai%3AHAL%3Ahal-01176013v1
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1556-4665&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1556-4665&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1556-4665&client=summon