Abstract
Regular expressions are widely used to describe and document regular languages, and to identify a set of (valid) strings.
Often they are not available or known, and they must be learned or inferred. Classical approaches like L* make strong assumptions
that normally do not hold. More feasible are testing approaches in which it is possible only to generate strings and check
them with the underlying system acting as \emphoracle. In this paper, we devise a method that starting from an initial guess
of the regular expression, it repeatedly generates and feeds strings to the system to check whether they are accepted or not,
and it tries to repair consistently the alleged solution. Our approach is based on an evolutionary algorithm in which both
the population of possible solutions and the set of strings co-evolve. Mutation is used for the population evolution in order
to produce the offspring. We run a set of experiments showing that the string generation policy is effective and that the
evolutionary approach outperforms existing techniques for regular expression repair.
[download the pdf file] [DOI]