Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions
Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent...
Saved in:
Published in | Proceedings / International Conference on Software Engineering pp. 348 - 358 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2021
|
Subjects | |
Online Access | Get full text |
ISBN | 1665402962 9781665402965 |
ISSN | 1558-1225 |
DOI | 10.1109/ICSE43902.2021.00042 |
Cover
Abstract | Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and "informative" test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size. |
---|---|
AbstractList | Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and "informative" test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size. |
Author | Pietrantuono, Roberto Russo, Stefano Guerriero, Antonio |
Author_xml | – sequence: 1 givenname: Antonio surname: Guerriero fullname: Guerriero, Antonio email: antonio.guerriero@unina.it organization: University of Naples Federico II, Italy – sequence: 2 givenname: Roberto surname: Pietrantuono fullname: Pietrantuono, Roberto email: roberto.pietrantuono@unina.it organization: University of Naples Federico II, Italy – sequence: 3 givenname: Stefano surname: Russo fullname: Russo, Stefano email: stefano.russo@unina.it organization: University of Naples Federico II, Italy |
BookMark | eNotj0FOwzAURC1RJNrSE8DCF0iwfxzXZleFQCuFdkFYV479Qy0gieyw6O1JBauRRjOjNwsy6_oOCbnnLOWc6Ydd8VaKTDNIgQFPGWMCrsiCS5kLBlrCjMx5nquEA-Q3ZBWjb5gQa82ZFHNSHwYMZvR9R32k4wnp1gSHcaQ1GnvC8EjLOPrvKdJ90Kf9nm6s_QnGnmnV958Xs-0DffVxCOi8vSzFW3Ldmq-Iq39dkvfnsi62SXV42RWbKjGg8jGxasKT2kysWePW3ABrrMpaibKx0roWHPJMNcopp5tcASgmW6msmGpC2WxJ7v52PSIehzBhhvNRT7-5ENkvQ0JSpw |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICSE43902.2021.00042 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EndPage | 358 |
ExternalDocumentID | 9402144 |
Genre | orig-research |
GroupedDBID | -~X .4S .DC 123 23M 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS APO ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F I07 IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS XOL |
ID | FETCH-LOGICAL-a285t-c896269a1663bd71a20bc83f6e6bc6cdf2de138b8d8d9b5822806f68c462648c3 |
IEDL.DBID | RIE |
ISBN | 1665402962 9781665402965 |
ISSN | 1558-1225 |
IngestDate | Wed Aug 27 02:21:08 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a285t-c896269a1663bd71a20bc83f6e6bc6cdf2de138b8d8d9b5822806f68c462648c3 |
PageCount | 11 |
ParticipantIDs | ieee_primary_9402144 |
PublicationCentury | 2000 |
PublicationDate | 2021-May |
PublicationDateYYYYMMDD | 2021-05-01 |
PublicationDate_xml | – month: 05 year: 2021 text: 2021-May |
PublicationDecade | 2020 |
PublicationTitle | Proceedings / International Conference on Software Engineering |
PublicationTitleAbbrev | ICSE |
PublicationYear | 2021 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssib044791064 ssj0006499 |
Score | 2.3327954 |
Snippet | Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 348 |
SubjectTerms | Adaptation models Artificial Neural Networks Computer bugs Labeling Neural networks Software Software engineering Software testing Testing |
Title | Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions |
URI | https://ieeexplore.ieee.org/document/9402144 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6AkydUMP5ODx4d7EfXtd4MQtQImggJN7K-doaYDALjoH-9r93AaDx425oette3ft_63vseIVeZThPpq8gDEwuPKUg9Ccz3QCBaQxIzDfZoYDji9xP2OI2nNXK9q4UxxrjkM9Oxly6WrxewsUdlXcmcwled1JNElrVaW99hLEHgs9S_2oU5c70jES7xLwmd1hZ12U67fih5WGk9be_jqqYu8GX3offaR5B2RVph4MQ8wx-dVxzwDJpkuH3kMt_kvbMpVAc-f6k5_ved9kn7u8SPvuzA64DUTH5ImtseD7T65Ftk_Lw0pZPQ-ZoiW6Q21o9QQist6Bvax13C8t78jd6NRvQWYLNK4YM-2aNhHERaTIfz9XJlY0LOzdtkMuiPe_de1YnBS0MRF7hyaDQuU7RYpHQSpKGvQEQZN1wBB52F2gSRUEILLVUsrMYOz7gAxm0GHURHpJEvcnNMaGxwchwxqUXEhJ9Jy6gySEOesgz50glpWSPNlqXYxqyyz-nfw2dkzy5TmYF4ThrFamMukCUU6tK5xxcwx7Qw |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagDDDxKuKNB0bSpont2GwIWhVoChJFYqvis4MqpFD1McCv5-ykRSAGtsTKkJwv_j777r4j5Dw3WaJCHQdguQyYhixQwMIAJKI1JJwZcEcDaV90n9ndC39ZIRfLWhhrrU8-sw136WP55h3m7qisqZhX-Folaxx3FUlZrbXwHsYShD5H_qt1WDDfPRIBE_dJ6LaurMv12g0jJaJK7Wlxz6uqulaomrfXT22EaV-mFbW8nGf0o_eKh57OJkkXL11mnLw15jPdgM9feo7__aotUv8u8qOPS_jaJiu22CGbiy4PtPrpd8ngYWxLN6GjKUW-SF20H8GEVmrQl7SN64RjvsUrven36RXAfJLBB-25w2EcRGJM09F0PHFRIe_odfLcaQ-uu0HViyHIIslnOHdoNKEytFisTdLKolCDjHNhhQYBJo-MbcVSSyON0lw6lR2RCwlMuBw6iPdIrXgv7D6h3OLDPGbKyJjJMFeOU-WQRSJjOTKmA7LrjDQcl3Ibw8o-h38Pn5H17iDtDXu3_fsjsuGmrMxHPCa12WRuT5AzzPSpd5Uv4Nm3gQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Operation+is+the+Hardest+Teacher%3A+Estimating+DNN+Accuracy+Looking+for+Mispredictions&rft.au=Guerriero%2C+Antonio&rft.au=Pietrantuono%2C+Roberto&rft.au=Russo%2C+Stefano&rft.date=2021-05-01&rft.pub=IEEE&rft.isbn=9781665402965&rft.issn=1558-1225&rft.spage=348&rft.epage=358&rft_id=info:doi/10.1109%2FICSE43902.2021.00042&rft.externalDocID=9402144 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-1225&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-1225&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-1225&client=summon |