Jean-Pierre Chanod (Rank Xerox) & Pasi Tapanainen (University of Helsinki)
This paper describes a non-deterministic tokeniser implemented and used for the development of a French finite-state grammar. The tokeniser includes a finite-state automaton for simple tokens and a dedicated lexical transducer that encodes a wide variety of multiword expressions, associated with multiple lexical descriptions when required.