
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Unsupervised Learning of Word Segmentation Rules with Genetic Algorithms
and Inductive Logic Programming
Dimitar Kazakov
and Suresh Manandhar.
Machine Learning, 43(1/2):121--162, April 2001.
Abstract
This article presents a combination of unsupervised and supervised learning
techniques for the generation of word segmentation rules from a raw list of
words. First, a language bias for word segmentation is introduced and a
simple genetic algorithm is used in the search for a segmentation that
corresponds to the best bias value. In the second phase, the words segmented
by the genetic algorithm are used as an input for the first order decision
list learner CLOG. The result is a set of first order rules which can be used
for segmentation of unseen words. When applied on either the training data or
unseen data, these rules produce segmentations which are linguistically
meaningful, and to a large degree conforming to the annotation provided.
BibTeX entry.
Other publications
D Kazakov,
kazakov@cs.york.ac.uk. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2