Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions

Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent...

Full description

Saved in:
Bibliographic Details
Published inProceedings / International Conference on Software Engineering pp. 348 - 358
Main Authors Guerriero, Antonio, Pietrantuono, Roberto, Russo, Stefano
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2021
Subjects
Online AccessGet full text
ISBN1665402962
9781665402965
ISSN1558-1225
DOI10.1109/ICSE43902.2021.00042

Cover

Abstract Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and "informative" test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size.
AbstractList Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and "informative" test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size.
Author Pietrantuono, Roberto
Russo, Stefano
Guerriero, Antonio
Author_xml – sequence: 1
  givenname: Antonio
  surname: Guerriero
  fullname: Guerriero, Antonio
  email: antonio.guerriero@unina.it
  organization: University of Naples Federico II, Italy
– sequence: 2
  givenname: Roberto
  surname: Pietrantuono
  fullname: Pietrantuono, Roberto
  email: roberto.pietrantuono@unina.it
  organization: University of Naples Federico II, Italy
– sequence: 3
  givenname: Stefano
  surname: Russo
  fullname: Russo, Stefano
  email: stefano.russo@unina.it
  organization: University of Naples Federico II, Italy
BookMark eNotj0FOwzAURC1RJNrSE8DCF0iwfxzXZleFQCuFdkFYV479Qy0gieyw6O1JBauRRjOjNwsy6_oOCbnnLOWc6Ydd8VaKTDNIgQFPGWMCrsiCS5kLBlrCjMx5nquEA-Q3ZBWjb5gQa82ZFHNSHwYMZvR9R32k4wnp1gSHcaQ1GnvC8EjLOPrvKdJ90Kf9nm6s_QnGnmnV958Xs-0DffVxCOi8vSzFW3Ldmq-Iq39dkvfnsi62SXV42RWbKjGg8jGxasKT2kysWePW3ABrrMpaibKx0roWHPJMNcopp5tcASgmW6msmGpC2WxJ7v52PSIehzBhhvNRT7-5ENkvQ0JSpw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE43902.2021.00042
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 358
ExternalDocumentID 9402144
Genre orig-research
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a285t-c896269a1663bd71a20bc83f6e6bc6cdf2de138b8d8d9b5822806f68c462648c3
IEDL.DBID RIE
ISBN 1665402962
9781665402965
ISSN 1558-1225
IngestDate Wed Aug 27 02:21:08 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a285t-c896269a1663bd71a20bc83f6e6bc6cdf2de138b8d8d9b5822806f68c462648c3
PageCount 11
ParticipantIDs ieee_primary_9402144
PublicationCentury 2000
PublicationDate 2021-May
PublicationDateYYYYMMDD 2021-05-01
PublicationDate_xml – month: 05
  year: 2021
  text: 2021-May
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib044791064
ssj0006499
Score 2.3327954
Snippet Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is...
SourceID ieee
SourceType Publisher
StartPage 348
SubjectTerms Adaptation models
Artificial Neural Networks
Computer bugs
Labeling
Neural networks
Software
Software engineering
Software testing
Testing
Title Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions
URI https://ieeexplore.ieee.org/document/9402144
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6AkydUMP5ODx4d7EfXtd4MQtQImggJN7K-doaYDALjoH-9r93AaDx425oette3ft_63vseIVeZThPpq8gDEwuPKUg9Ccz3QCBaQxIzDfZoYDji9xP2OI2nNXK9q4UxxrjkM9Oxly6WrxewsUdlXcmcwled1JNElrVaW99hLEHgs9S_2oU5c70jES7xLwmd1hZ12U67fih5WGk9be_jqqYu8GX3offaR5B2RVph4MQ8wx-dVxzwDJpkuH3kMt_kvbMpVAc-f6k5_ved9kn7u8SPvuzA64DUTH5ImtseD7T65Ftk_Lw0pZPQ-ZoiW6Q21o9QQist6Bvax13C8t78jd6NRvQWYLNK4YM-2aNhHERaTIfz9XJlY0LOzdtkMuiPe_de1YnBS0MRF7hyaDQuU7RYpHQSpKGvQEQZN1wBB52F2gSRUEILLVUsrMYOz7gAxm0GHURHpJEvcnNMaGxwchwxqUXEhJ9Jy6gySEOesgz50glpWSPNlqXYxqyyz-nfw2dkzy5TmYF4ThrFamMukCUU6tK5xxcwx7Qw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagDDDxKuKNB0bSpont2GwIWhVoChJFYqvis4MqpFD1McCv5-ykRSAGtsTKkJwv_j777r4j5Dw3WaJCHQdguQyYhixQwMIAJKI1JJwZcEcDaV90n9ndC39ZIRfLWhhrrU8-sw136WP55h3m7qisqZhX-Folaxx3FUlZrbXwHsYShD5H_qt1WDDfPRIBE_dJ6LaurMv12g0jJaJK7Wlxz6uqulaomrfXT22EaV-mFbW8nGf0o_eKh57OJkkXL11mnLw15jPdgM9feo7__aotUv8u8qOPS_jaJiu22CGbiy4PtPrpd8ngYWxLN6GjKUW-SF20H8GEVmrQl7SN64RjvsUrven36RXAfJLBB-25w2EcRGJM09F0PHFRIe_odfLcaQ-uu0HViyHIIslnOHdoNKEytFisTdLKolCDjHNhhQYBJo-MbcVSSyON0lw6lR2RCwlMuBw6iPdIrXgv7D6h3OLDPGbKyJjJMFeOU-WQRSJjOTKmA7LrjDQcl3Ibw8o-h38Pn5H17iDtDXu3_fsjsuGmrMxHPCa12WRuT5AzzPSpd5Uv4Nm3gQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Operation+is+the+Hardest+Teacher%3A+Estimating+DNN+Accuracy+Looking+for+Mispredictions&rft.au=Guerriero%2C+Antonio&rft.au=Pietrantuono%2C+Roberto&rft.au=Russo%2C+Stefano&rft.date=2021-05-01&rft.pub=IEEE&rft.isbn=9781665402965&rft.issn=1558-1225&rft.spage=348&rft.epage=358&rft_id=info:doi/10.1109%2FICSE43902.2021.00042&rft.externalDocID=9402144
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-1225&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-1225&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-1225&client=summon