Input prioritization for testing neural networks

Taejoon Byun; Vaibhav Sharma; Abhishek Vijayakumar; Sanjai Rayadurgam; Darren Cofer

doi:10.1109/AITest.2019.000-6

Input prioritization for testing neural networks

Taejoon Byun, Vaibhav Sharma, Abhishek Vijayakumar, Sanjai Rayadurgam, Darren Cofer

University of Minnesota

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

48 Scopus citations

Abstract

Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data - the expected output, a.k.a. label, for a given input - is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization - confidence, uncertainty and surprise - and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%.

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	63-70
Number of pages	8
ISBN (Electronic)	9781728104928
DOIs	https://doi.org/10.1109/AITest.2019.000-6
State	Published - May 17 2019
Event	1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019 - Newark, United States Duration: Apr 4 2019 → Apr 9 2019

Publication series

Name	Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019

Conference

Conference	1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019
Country/Territory	United States
City	Newark
Period	4/4/19 → 4/9/19

Bibliographical note

Funding Information:
ACKNOWLEDGMENT This work is supported by AFRL and DARPA under contract FA8750-18-C-0099.

Publisher Copyright:
© 2019 IEEE.

Keywords

Coverage criteria
Machine learning
Neural networks
Test prioritization

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access

10.1109/AITest.2019.000-6

OpenUrl availability

Full text

Cite this

Byun, T., Sharma, V., Vijayakumar, A., Rayadurgam, S., & Cofer, D. (2019). Input prioritization for testing neural networks. In Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019 (pp. 63-70). Article 8718224 (Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/AITest.2019.000-6

Input prioritization for testing neural networks. / Byun, Taejoon; Sharma, Vaibhav; Vijayakumar, Abhishek et al.
Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 63-70 8718224 (Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Byun, T, Sharma, V, Vijayakumar, A, Rayadurgam, S & Cofer, D 2019, Input prioritization for testing neural networks. in Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019., 8718224, Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019, Institute of Electrical and Electronics Engineers Inc., pp. 63-70, 1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019, Newark, United States, 4/4/19. https://doi.org/10.1109/AITest.2019.000-6

Byun T, Sharma V, Vijayakumar A, Rayadurgam S, Cofer D. Input prioritization for testing neural networks. In Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 63-70. 8718224. (Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019). doi: 10.1109/AITest.2019.000-6

Byun, Taejoon ; Sharma, Vaibhav ; Vijayakumar, Abhishek et al. / Input prioritization for testing neural networks. Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 63-70 (Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019).

@inproceedings{8064eb9bdf454e808fd0c414dc469e57,

title = "Input prioritization for testing neural networks",

abstract = "Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data - the expected output, a.k.a. label, for a given input - is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization - confidence, uncertainty and surprise - and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%.",

keywords = "Coverage criteria, Machine learning, Neural networks, Test prioritization",

author = "Taejoon Byun and Vaibhav Sharma and Abhishek Vijayakumar and Sanjai Rayadurgam and Darren Cofer",

note = "Funding Information: ACKNOWLEDGMENT This work is supported by AFRL and DARPA under contract FA8750-18-C-0099. Publisher Copyright: {\textcopyright} 2019 IEEE.; 1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019 ; Conference date: 04-04-2019 Through 09-04-2019",

year = "2019",

month = may,

day = "17",

doi = "10.1109/AITest.2019.000-6",

language = "English (US)",

series = "Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "63--70",

booktitle = "Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019",

}

TY - GEN

T1 - Input prioritization for testing neural networks

AU - Byun, Taejoon

AU - Sharma, Vaibhav

AU - Vijayakumar, Abhishek

AU - Rayadurgam, Sanjai

AU - Cofer, Darren

PY - 2019/5/17

Y1 - 2019/5/17

N2 - Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data - the expected output, a.k.a. label, for a given input - is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization - confidence, uncertainty and surprise - and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%.

AB - Deep neural networks (DNNs) are increasingly being adopted for sensing and control functions in a variety of safety and mission-critical systems such as self-driving cars, autonomous air vehicles, medical diagnostics and industrial robotics. Failures of such systems can lead to loss of life or property, which necessitates stringent verification and validation for providing high assurance. Though formal verification approaches are being investigated, testing remains the primary technique for assessing the dependability of such systems. Due to the nature of the tasks handled by DNNs, the cost of obtaining test oracle data - the expected output, a.k.a. label, for a given input - is high, which significantly impacts the amount and quality of testing that can be performed. Thus, prioritizing input data for testing DNNs in meaningful ways to reduce the cost of labeling can go a long way in increasing testing efficacy. This paper proposes using gauges of the DNN's sentiment derived from the computation performed by the model, as a means to identify inputs that are likely to reveal weaknesses. We empirically assessed the efficacy of three such sentiment measures for prioritization - confidence, uncertainty and surprise - and compare their effectiveness in terms of their fault-revealing capability and retraining effectiveness. The results indicate that sentiment measures can effectively flag inputs that expose unacceptable DNN behavior. For MNIST models, the average percentage of inputs correctly flagged ranged from 88% to 94.8%.

KW - Coverage criteria

KW - Machine learning

KW - Neural networks

KW - Test prioritization

UR - http://www.scopus.com/inward/record.url?scp=85067097508&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067097508&partnerID=8YFLogxK

U2 - 10.1109/AITest.2019.000-6

DO - 10.1109/AITest.2019.000-6

M3 - Conference contribution

AN - SCOPUS:85067097508

T3 - Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019

SP - 63

EP - 70

BT - Proceedings - 2019 IEEE International Conference on Artificial Intelligence Testing, AITest 2019

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 1st IEEE International Conference on Artificial Intelligence Testing, AITest 2019

Y2 - 4 April 2019 through 9 April 2019

ER -

Input prioritization for testing neural networks

Abstract

Publication series

Conference

Bibliographical note

Keywords

UN SDGs

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this