The inclusion problem for regular expressions

Hovland, Dag

Journal article; SubmittedVersion

View/Open

reinclusionJCSS.pdf (222.0Kb)

Year

2011

Original version

Journal of computer and system sciences. 2011, 78 (6), 1795-1813, DOI: http://dx.doi.org/10.1016/j.jcss.2011.12.003

Abstract

This paper presents a polynomial-time algorithm for the inclusion problem for a large class of regular expressions. The algorithm is not based on construction of finite automata, and can therefore be faster than the lower bound implied by the Myhill-Nerode theorem. The algorithm automatically discards irrelevant parts of the right-hand expression. The irrelevant parts of the right-hand expression might even be 1-ambiguous. For example, if r is a regular expression such that any DFA recognizing r is very large, the algorithm can still, in time independent of r, decide that the language of ab is included in that of (a + r)b. The algorithm is based on a syntax-directed inference system. It takes arbitrary regular expressions as input. If the 1-ambiguity of the right-hand expression becomes a problem, the algorithm will report this. Otherwise, it will decide the inclusion problem for the input.

NOTICE: this is the author’s version of a work that was accepted for publication in Journal of computer and system sciences. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of computer and system sciences, 2011, http://dx.doi.org/10.1016/j.jcss.2011.12.003