Logo[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]

Natural Language Processing Applications of Machine Learning

Dimitar Kazakov. PhD thesis, Department of Cybernetics, Czech Technical University, Prague, May 1999.

Abstract

This thesis offers an overview of some of the machine learning techniques used for the purposes of natural language processing. The relationship between machine learning and natural language processing is sketched and the mutual importance of the two AI areas outlined. The thesis describes two original projects. The first one introduces the system Lapis aiming at the inductive learning of LR parsers of natural language from treebanks. Lexical semantic tags present in the treebank are used to learn additional semantic constraints for the parsers. The system combines standard tools for parser development with original software for the automatic generation of efficient domain-adapted natural language parsers.

In the second project presented in this thesis, the author suggests a bias for unsupervised word segmentation. To make the optimisation process efficient, a genetic algorithm is applied to reduce the search space and draw the first draft of the word segmentations sought after. Then, a set of rules for word segmentation are found and expressed in a high-order formalism (first-order logic) by the means of inductive logic programming techniques; the application of those rules to the training data produces the final word segmentations. The most important advantages of the method are the large scale of languages to which it is applicable, its ability to produce useful results even from relatively small datasets, and also the fact that minimal or no preprocessing of the used data (raw lists of words or tokenised text) is required.

BibTeX entry.

Other publications


D Kazakov, kazakov@cs.york.ac.uk. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2