MoA's features
Date : 95.8.13 (Sun), bgjang@csone
Readible codes :
I think so, but the other, too ? I am not sure but I
have efforts to write readible codes by the other one.
If you look my codes and recognize my programming habits/styles,
then it would not be difficult to fix my programs.
Generalized concepts :
There eixsts several difficult and troublesome part in
MA. And I have no knowledge about this area, so I think much
to generalize/simplify the problem although it require
more system resources(time/memory).
Irregular processing with tables and dictionary :
Anybody can easily add irregular rule. See irrtbl.c
for more info. And there eixsts a dictionary called I-Type dictionary
which contains the words to which irregular rule must be applied.
in MoA, exceptional word and contraction recovery to base-form
are occured together.
Anybody can update E-Type dictionary(exceptional word dictionary)
which contains the exceptional words and contracted form of word
and its base form.
in MoA, the connectivity information between POS is used.
And this table can be changed easily using bundled utility
named chkcon
. This utility and information about
the connection between POS is stolen from KTS.
Using POS defined by Jae-Hoon, Lee :
Like KTS, MoA use POS(Part Of Speech) defined
by Jae-Hoon, Kim. Refer to the Postscript file
morph-model.ps for the detail description
of the meanings of POS.
Hashing in Chart data structure :
MA using chart mainly searching
the chart data structure, so I use Hashing(strictly, it is an
modification of Hashing technique)
to search some edge in chart.
Using Lex to tokenization :
So anybody can define token set easily
and modify program according to that.
big dictionaries, sufficient information tables :
For N-Type dictionary, we use the entire Korean dictionary
made by WHO(?). And for I-Type dictionary, corpus.
Jae-Hoon, Kim.
Irregular tables. Unknown word connectivity informations.
etc.
dictionary management tool :
although it have no help.
Memory Cache for N-Type dictionary :
For this algorithm, there will be more N-Type dictionary(normal
dictionary) access than the previous MA algorithm. So there
must be some memory caching scheme for N-Type dictionary.
I devised a caching scheme
which is modified from Hashing data structure and proper
to Korean Language dictionary. Surely,
I programmed it into MoA.
Using syllable information :
using syllable information when checking
normal word in chart decrease the frequency to look the dictionary
so the system efficieny will be increased.
Checking the end of word :
when checking the portion of irregulation :
irregulation only occur in the end of word. So this informations
decrease the number of chart generated by MA.
Unknown word handling :
This step is very important part of MA.
Problems of MoA
Home of MoA
Byoung-Gyu, Chang / bgjang@csone.kaist.ac.kr