Abstract
Summary Regular expressions are used to characterize sets of strings (ie, languages) using a pattern-based syntax. They are
applied in different contexts as, for example, data validation in Web forms. However, writing a regular expression that exactly
captures the desired set of strings could be particularly difficult, and techniques are sought to validate regular expressions
or test their use in applications. A common means to regular expression validation and testing is the generation of a set
of labelled strings (ie, strings together with their evaluation). We here propose a fault-based approach for generating strings
usable as tests for regular expressions. We define some fault classes representing mistakes that could be made when writing
a regular expression, and we introduce the notion of distinguishing string, ie, a string that is able to expose a fault. Given
a regular expression, our approach generates a test suite composed of distinguishing strings that are able to detect possible
faults in the regular expression. We present different versions of the approach, which provide different results in terms
of test suite size and generation time. Experiments show that the proposed approach can generate compact test suites and that,
using suitable optimizations, the generation time is reasonable. Exploiting the proposed fault classes, we use the notion
of mutation score to assess the ability of a generic set of strings in exposing possible faults contained in the regular expression
under test. A comparison with other test generation tools in terms of mutation score, size, and generation time shows the
advantages and limits of our approach.
[download the pdf file] [DOI]