The paper discusses the problem of reconstructing the accent marks often missing from Hungarian electronic texts. Our program, based on a statistical analysis of a large corpus of properly accentuated text, makes very little use of symbol-manipulation techniques, but performs reasonably well (1.58% error) because of the breadth of data it stores about the relationship between the unaccented and the accented forms.
The full paper, Gépi ékezés, is available (in Hungarian) as gzipped postscript or as (uncompressed) pdf.
The source is available in gzipped tar format under GNU copyleft. Sorry, all documentation is in Hungarian.
Binary distrbutions are longer available. Back to Home Page