
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Natural Language Processing Applications of Machine Learning
Dimitar Kazakov.
PhD thesis, Department of Cybernetics, Czech Technical University, Prague, May
1999.
Abstract
This thesis offers an overview of some of the machine learning techniques used
for the purposes of natural language processing. The relationship between
machine learning and natural language processing is sketched and the mutual
importance of the two AI areas outlined. The thesis describes two original
projects. The first one introduces the system Lapis aiming at the inductive
learning of LR parsers of natural language from treebanks. Lexical semantic
tags present in the treebank are used to learn additional semantic
constraints for the parsers. The system combines standard tools for parser
development with original software for the automatic generation of efficient
domain-adapted natural language parsers. In the second project presented
in this thesis, the author suggests a bias for unsupervised word
segmentation. To make the optimisation process efficient, a genetic algorithm
is applied to reduce the search space and draw the first draft of the word
segmentations sought after. Then, a set of rules for word segmentation are
found and expressed in a high-order formalism (first-order logic) by the
means of inductive logic programming techniques; the application of those
rules to the training data produces the final word segmentations. The most
important advantages of the method are the large scale of languages to which
it is applicable, its ability to produce useful results even from relatively
small datasets, and also the fact that minimal or no preprocessing of the
used data (raw lists of words or tokenised text) is required.
BibTeX entry.
Other publications
D Kazakov,
kazakov@cs.york.ac.uk. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2