Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions

Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 348 - 358
Main Authors	Guerriero, Antonio, Pietrantuono, Roberto, Russo, Stefano
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2021
Subjects	Adaptation models Artificial Neural Networks Computer bugs Labeling Neural networks Software Software engineering Software testing Testing
Online Access	Get full text
ISBN	1665402962 9781665402965
ISSN	1558-1225
DOI	10.1109/ICSE43902.2021.00042

Cover

Abstract	Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and "informative" test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size.
AbstractList	Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and "informative" test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size.
Author	Pietrantuono, Roberto Russo, Stefano Guerriero, Antonio
Author_xml	– sequence: 1 givenname: Antonio surname: Guerriero fullname: Guerriero, Antonio email: antonio.guerriero@unina.it organization: University of Naples Federico II, Italy – sequence: 2 givenname: Roberto surname: Pietrantuono fullname: Pietrantuono, Roberto email: roberto.pietrantuono@unina.it organization: University of Naples Federico II, Italy – sequence: 3 givenname: Stefano surname: Russo fullname: Russo, Stefano email: stefano.russo@unina.it organization: University of Naples Federico II, Italy
BookMark	eNotj0FOwzAURC1RJNrSE8DCF0iwfxzXZleFQCuFdkFYV479Qy0gieyw6O1JBauRRjOjNwsy6_oOCbnnLOWc6Ydd8VaKTDNIgQFPGWMCrsiCS5kLBlrCjMx5nquEA-Q3ZBWjb5gQa82ZFHNSHwYMZvR9R32k4wnp1gSHcaQ1GnvC8EjLOPrvKdJ90Kf9nm6s_QnGnmnV958Xs-0DffVxCOi8vSzFW3Ldmq-Iq39dkvfnsi62SXV42RWbKjGg8jGxasKT2kysWePW3ABrrMpaibKx0roWHPJMNcopp5tcASgmW6msmGpC2WxJ7v52PSIehzBhhvNRT7-5ENkvQ0JSpw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICSE43902.2021.00042
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	358
ExternalDocumentID	9402144
Genre	orig-research
GroupedDBID	-~X .4S .DC 123 23M 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS APO ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F I07 IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS XOL
ID	FETCH-LOGICAL-a285t-c896269a1663bd71a20bc83f6e6bc6cdf2de138b8d8d9b5822806f68c462648c3
IEDL.DBID	RIE
ISBN	1665402962 9781665402965
ISSN	1558-1225
IngestDate	Wed Aug 27 02:21:08 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a285t-c896269a1663bd71a20bc83f6e6bc6cdf2de138b8d8d9b5822806f68c462648c3
PageCount	11
ParticipantIDs	ieee_primary_9402144
PublicationCentury	2000
PublicationDate	2021-May
PublicationDateYYYYMMDD	2021-05-01
PublicationDate_xml	– month: 05 year: 2021 text: 2021-May
PublicationDecade	2020
PublicationTitle	Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev	ICSE
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssib044791064 ssj0006499
Score	2.3327954
Snippet	Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is...
SourceID	ieee
SourceType	Publisher
StartPage	348
SubjectTerms	Adaptation models Artificial Neural Networks Computer bugs Labeling Neural networks Software Software engineering Software testing Testing
Title	Operation is the Hardest Teacher: Estimating DNN Accuracy Looking for Mispredictions
URI	https://ieeexplore.ieee.org/document/9402144
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6AkydUMP5ODx4d7EfXtd4MQtQImggJN7K-doaYDALjoH-9r93AaDx425oette3ft_63vseIVeZThPpq8gDEwuPKUg9Ccz3QCBaQxIzDfZoYDji9xP2OI2nNXK9q4UxxrjkM9Oxly6WrxewsUdlXcmcwled1JNElrVaW99hLEHgs9S_2oU5c70jES7xLwmd1hZ12U67fih5WGk9be_jqqYu8GX3offaR5B2RVph4MQ8wx-dVxzwDJpkuH3kMt_kvbMpVAc-f6k5_ved9kn7u8SPvuzA64DUTH5ImtseD7T65Ftk_Lw0pZPQ-ZoiW6Q21o9QQist6Bvax13C8t78jd6NRvQWYLNK4YM-2aNhHERaTIfz9XJlY0LOzdtkMuiPe_de1YnBS0MRF7hyaDQuU7RYpHQSpKGvQEQZN1wBB52F2gSRUEILLVUsrMYOz7gAxm0GHURHpJEvcnNMaGxwchwxqUXEhJ9Jy6gySEOesgz50glpWSPNlqXYxqyyz-nfw2dkzy5TmYF4ThrFamMukCUU6tK5xxcwx7Qw
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagDDDxKuKNB0bSpont2GwIWhVoChJFYqvis4MqpFD1McCv5-ykRSAGtsTKkJwv_j777r4j5Dw3WaJCHQdguQyYhixQwMIAJKI1JJwZcEcDaV90n9ndC39ZIRfLWhhrrU8-sw136WP55h3m7qisqZhX-Folaxx3FUlZrbXwHsYShD5H_qt1WDDfPRIBE_dJ6LaurMv12g0jJaJK7Wlxz6uqulaomrfXT22EaV-mFbW8nGf0o_eKh57OJkkXL11mnLw15jPdgM9feo7__aotUv8u8qOPS_jaJiu22CGbiy4PtPrpd8ngYWxLN6GjKUW-SF20H8GEVmrQl7SN64RjvsUrven36RXAfJLBB-25w2EcRGJM09F0PHFRIe_odfLcaQ-uu0HViyHIIslnOHdoNKEytFisTdLKolCDjHNhhQYBJo-MbcVSSyON0lw6lR2RCwlMuBw6iPdIrXgv7D6h3OLDPGbKyJjJMFeOU-WQRSJjOTKmA7LrjDQcl3Ibw8o-h38Pn5H17iDtDXu3_fsjsuGmrMxHPCa12WRuT5AzzPSpd5Uv4Nm3gQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Operation+is+the+Hardest+Teacher%3A+Estimating+DNN+Accuracy+Looking+for+Mispredictions&rft.au=Guerriero%2C+Antonio&rft.au=Pietrantuono%2C+Roberto&rft.au=Russo%2C+Stefano&rft.date=2021-05-01&rft.pub=IEEE&rft.isbn=9781665402965&rft.issn=1558-1225&rft.spage=348&rft.epage=358&rft_id=info:doi/10.1109%2FICSE43902.2021.00042&rft.externalDocID=9402144
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-1225&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-1225&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-1225&client=summon