The risks of coverage-directed test case generation

Gregory Gay; Matt Staats; Michael Whalen; Mats P.E. Heimdahl

doi:10.1109/TSE.2015.2421011

The risks of coverage-directed test case generation

Gregory Gay, Matt Staats, Michael Whalen, Mats P.E. Heimdahl

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

84 Scopus citations

Abstract

A number of structural coverage criteria have been proposed to measure the adequacy of testing efforts. In the avionics and other critical systems domains, test suites satisfying structural coverage criteria are mandated by standards. With the advent of powerful automated test generation tools, it is tempting to simply generate test inputs to satisfy these structural coverage criteria. However, while techniques to produce coverage-providing tests are well established, the effectiveness of such approaches in terms of fault detection ability has not been adequately studied. In this work, we evaluate the effectiveness of test suites generated to satisfy four coverage criteria through counterexample-based test generation and a random generation approach - where tests are randomly generated until coverage is achieved - contrasted against purely random test suites of equal size. Our results yield three key conclusions. First, coverage criteria satisfaction alone can be a poor indication of fault finding effectiveness, with inconsistent results between the seven case examples (and random test suites of equal size often providing similar - or even higher - levels of fault finding). Second, the use of structural coverage as a supplement - rather than a target - for test generation can have a positive impact, with random test suites reduced to a coverage-providing subset detecting up to 13.5 percent more faults than test suites generated specifically to achieve coverage. Finally, Observable MC/DC, a criterion designed to account for program structure and the selection of the test oracle, can - in part - address the failings of traditional structural coverage criteria, allowing for the generation of test suites achieving higher levels of fault detection than random test suites of equal size. These observations point to risks inherent in the increase in test automation in critical systems, and the need for more research in how coverage criteria, test generation approaches, the test oracle used, and system structure jointly influence test effectiveness.

Original language	English (US)
Article number	7081779
Pages (from-to)	803-819
Number of pages	17
Journal	IEEE Transactions on Software Engineering
Volume	41
Issue number	8
DOIs	https://doi.org/10.1109/TSE.2015.2421011
State	Published - Aug 1 2015

Bibliographical note

Publisher Copyright:
© 2015 IEEE.

Keywords

Software Testing
System Testing

Access

10.1109/TSE.2015.2421011

OpenUrl availability

Full text

Cite this

@article{7352c1fbdeed4b82b8999dfbc3cee5e3,

title = "The risks of coverage-directed test case generation",

abstract = "A number of structural coverage criteria have been proposed to measure the adequacy of testing efforts. In the avionics and other critical systems domains, test suites satisfying structural coverage criteria are mandated by standards. With the advent of powerful automated test generation tools, it is tempting to simply generate test inputs to satisfy these structural coverage criteria. However, while techniques to produce coverage-providing tests are well established, the effectiveness of such approaches in terms of fault detection ability has not been adequately studied. In this work, we evaluate the effectiveness of test suites generated to satisfy four coverage criteria through counterexample-based test generation and a random generation approach - where tests are randomly generated until coverage is achieved - contrasted against purely random test suites of equal size. Our results yield three key conclusions. First, coverage criteria satisfaction alone can be a poor indication of fault finding effectiveness, with inconsistent results between the seven case examples (and random test suites of equal size often providing similar - or even higher - levels of fault finding). Second, the use of structural coverage as a supplement - rather than a target - for test generation can have a positive impact, with random test suites reduced to a coverage-providing subset detecting up to 13.5 percent more faults than test suites generated specifically to achieve coverage. Finally, Observable MC/DC, a criterion designed to account for program structure and the selection of the test oracle, can - in part - address the failings of traditional structural coverage criteria, allowing for the generation of test suites achieving higher levels of fault detection than random test suites of equal size. These observations point to risks inherent in the increase in test automation in critical systems, and the need for more research in how coverage criteria, test generation approaches, the test oracle used, and system structure jointly influence test effectiveness.",

keywords = "Software Testing, System Testing",

author = "Gregory Gay and Matt Staats and Michael Whalen and Heimdahl, {Mats P.E.}",

note = "Publisher Copyright: {\textcopyright} 2015 IEEE.",

year = "2015",

month = aug,

day = "1",

doi = "10.1109/TSE.2015.2421011",

language = "English (US)",

volume = "41",

pages = "803--819",

journal = "IEEE Transactions on Software Engineering",

issn = "0098-5589",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "8",

}

TY - JOUR

T1 - The risks of coverage-directed test case generation

AU - Gay, Gregory

AU - Staats, Matt

AU - Whalen, Michael

AU - Heimdahl, Mats P.E.

PY - 2015/8/1

Y1 - 2015/8/1

N2 - A number of structural coverage criteria have been proposed to measure the adequacy of testing efforts. In the avionics and other critical systems domains, test suites satisfying structural coverage criteria are mandated by standards. With the advent of powerful automated test generation tools, it is tempting to simply generate test inputs to satisfy these structural coverage criteria. However, while techniques to produce coverage-providing tests are well established, the effectiveness of such approaches in terms of fault detection ability has not been adequately studied. In this work, we evaluate the effectiveness of test suites generated to satisfy four coverage criteria through counterexample-based test generation and a random generation approach - where tests are randomly generated until coverage is achieved - contrasted against purely random test suites of equal size. Our results yield three key conclusions. First, coverage criteria satisfaction alone can be a poor indication of fault finding effectiveness, with inconsistent results between the seven case examples (and random test suites of equal size often providing similar - or even higher - levels of fault finding). Second, the use of structural coverage as a supplement - rather than a target - for test generation can have a positive impact, with random test suites reduced to a coverage-providing subset detecting up to 13.5 percent more faults than test suites generated specifically to achieve coverage. Finally, Observable MC/DC, a criterion designed to account for program structure and the selection of the test oracle, can - in part - address the failings of traditional structural coverage criteria, allowing for the generation of test suites achieving higher levels of fault detection than random test suites of equal size. These observations point to risks inherent in the increase in test automation in critical systems, and the need for more research in how coverage criteria, test generation approaches, the test oracle used, and system structure jointly influence test effectiveness.

AB - A number of structural coverage criteria have been proposed to measure the adequacy of testing efforts. In the avionics and other critical systems domains, test suites satisfying structural coverage criteria are mandated by standards. With the advent of powerful automated test generation tools, it is tempting to simply generate test inputs to satisfy these structural coverage criteria. However, while techniques to produce coverage-providing tests are well established, the effectiveness of such approaches in terms of fault detection ability has not been adequately studied. In this work, we evaluate the effectiveness of test suites generated to satisfy four coverage criteria through counterexample-based test generation and a random generation approach - where tests are randomly generated until coverage is achieved - contrasted against purely random test suites of equal size. Our results yield three key conclusions. First, coverage criteria satisfaction alone can be a poor indication of fault finding effectiveness, with inconsistent results between the seven case examples (and random test suites of equal size often providing similar - or even higher - levels of fault finding). Second, the use of structural coverage as a supplement - rather than a target - for test generation can have a positive impact, with random test suites reduced to a coverage-providing subset detecting up to 13.5 percent more faults than test suites generated specifically to achieve coverage. Finally, Observable MC/DC, a criterion designed to account for program structure and the selection of the test oracle, can - in part - address the failings of traditional structural coverage criteria, allowing for the generation of test suites achieving higher levels of fault detection than random test suites of equal size. These observations point to risks inherent in the increase in test automation in critical systems, and the need for more research in how coverage criteria, test generation approaches, the test oracle used, and system structure jointly influence test effectiveness.

KW - Software Testing

KW - System Testing

UR - http://www.scopus.com/inward/record.url?scp=84939497216&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84939497216&partnerID=8YFLogxK

U2 - 10.1109/TSE.2015.2421011

DO - 10.1109/TSE.2015.2421011

M3 - Article

AN - SCOPUS:84939497216

SN - 0098-5589

VL - 41

SP - 803

EP - 819

JO - IEEE Transactions on Software Engineering

JF - IEEE Transactions on Software Engineering

IS - 8

M1 - 7081779

ER -

The risks of coverage-directed test case generation

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this