
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
A Rule-Based Tagger Development Framework
Zoltán Alexin,
Péter Leipold,
János
Csirik,
Károly Bibok,
and Tibor Gyimóthy.
In Lubos Popelínský
and Miloslav Nepil, editors,
Proceedings of the 3d Workshop on Learning Language in Logic, pages
1--10, Strasbourg, France, September 2001.
Abstract
POS (Part-of-speech) tagging is an important step in natural language
processing because identically written words may have different meanings.
Part-of-speech tagging is the procedure during which the correct
morphological annotation (the correct tag) for an ambiguous word is selected.
Computer programs able to do the process automatically are called POS
taggers. In this paper the RTDF (a Rule-based Tagger Development Framework)
is presented that is capable identifying general tagging rules using
different machine learning tools given a suitably large training data set.
The framework can combine the learned tagging rules, and evaluate the
resulted taggers. The authors are participants of a project for developing a
medium sized learning corpus for Hungarian. The corpus contains 1 million
words and -- among others -- can serve as a suitable medium on which the
previously developed POS-taggers can be tested. During the project the
morphological analyzer for Hungarian has been thoroughly investigated and the
MSD encoding has been refined. The development of the corpus is going on and
will be completed at the end of 2001.
BibTeX entry.
Other publications
Z Alexin,
alexin@inf.u-szeged.hu,
T Gyimothy,
gyimi@inf.u-szeged.hu. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2