Matching a set of strings with variable length don't cares

Gregory Kucherov, Michaël Rusinowitch

Research output: Contribution to journalArticlepeer-review

42 Citations (Scopus)

Abstract

Given an alphabet A, a pattern p is a word v1@ ⋯ @vm, where vi ∈ A* and @ ∉ A is a distinguished symbol called a variable length don't care symbol. Pattern p matches a text t ∈ A* if t = u0v1u1 ⋯ um - 1vmum for some u0, ⋯ , um ∈ A*. We address the following problem: given a set P of patterns and a text t, test whether one of the patterns of P matches t. We describe an algorithm that solves the problem in time O((|t| + |P|)log |P|). In contrast to most of the existing string matching algorithms (such as that of Aho-Corasick) our algorithm is not composed of two successive stages - preprocessing the pattern (resp. the text) and reading through the text (resp. the pattern) - but has these two stages essentially interleaved. Our approach is based on using the DAWG (Directed Acyclic Word Graph), a data structure studied by A. Blumer J. Blumer, Haussler, Ehrenfeucht, Crochemore, Chen, Seiferas.

Original languageEnglish
Pages (from-to)129-154
Number of pages26
JournalTheoretical Computer Science
Volume178
Issue number1-2
DOIs
Publication statusPublished - 30 May 1997
Externally publishedYes

Fingerprint

Dive into the research topics of 'Matching a set of strings with variable length don't cares'. Together they form a unique fingerprint.

Cite this