An Infrastructure for Mutation-based Evaluation of Testing Strategies

Software testing is an active field of research aimed at improving the quality of real-world software. Test suites, which are composed of test cases, are used to test software. Test cases are selected according to a testing strategy or test case selection criterion. New testing strategies and test case selection criteria are proposed frequently. Since the goal of new publications is improving testing effectiveness, this poses the question: How effective are different strategies? To answer this question, either a theoretical model or an experiment with real-world faults can be used. Using real-world faults entails a lot of manual work: To test a strategy, experiments have to be conducted. Since it is time-consuming to perform these tests on real faults, mutation testing is often used. Mutation testing is a technique where artificial faults (mutations) are inserted into a software system. These mutations have been found to be good substitutes for real faults and are continously improved. With mutations, a large number of faults can be created for a software system, significantly reducing the amount of manual work required for each data point in the experiment. However, frameworks for mutation testing do not support the whole workflow, leaving the researcher to implement their own tooling for evaluation. Especially the integration with build systems is often missing or restricted to a single build system. Furthermore, a large number of these frameworks exists, each supporting different parts of the overall process. Most of the frameworks provide new kinds of mutations which may be helpful for the evaluation. To evaluate test case selection criteria using these new kinds of mutations, a researcher has to adapt their process to each framework. To standardize the approach to evaluation, a process for conducting the experiment and analysis has been developed in this thesis. The tool Meter (Mutation-based Evaluation of Testing Strategies) was developed in this thesis to implement this process. Meter is designed to work with all mutation frameworks and build systems. To use Meter with a mutation framework or build system, only a handler performing basic functions is required. Meter handles the complete process of inserting faults, executing test suites, and analyzing the results. This significantly reduces the amount of work required for such an experiment. To prove the viability of both the process and Meter, two experiments were conducted, comparing the test case selection criteria Combinatorial Testing and Combinatorial Robustness Testing. In the experiment, Combinatorial Robustness Testing was found to be more effective than normal Combinatorial Testing. This confirms previous results from an experiment conducted on a theoretical model.