Light parsing as finite-state filtering Gregory Grefenstette (Rank Xerox)

For a number of language processing tasks, such as information retrieval and information extraction tasks, pertinent information can be extracted from text without doing a full parse of the individual sentences. The most common restriction of the parser is to adopt a non-recursive model of the language treated, which allows an implementation of the parser using efficient finite-state tools at the cost of missing some coverage. These light parsers allow the successive introduction of symbols into the input string wherever specified regular expressions of words and/or part-of-speech tags match. Recent advances in finite-state expression compilation make writing mark-up transducers simpler, leading to quicker implementations of layered finite-state parsers. The resulting parsers are easier to create and maintain. In this article, we describe a light parsing method using recently created finite-state operators. Two applications of this parser are described: grouping adjacent syntactically-related units, and extracting non-adjacent n-ary grammatical relations. A system for evaluating the parser over a large corpus is described.


PS version (6 pages, 120k)

PDF version (6 pages, 160k)