
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Relational Learning Techniques for Natural Language Information
Extraction
Mary Elaine Califf.
PhD thesis, Department of Computer Sciences, University of Texas, Austin, TX,
August 1998..
Also appears as Artificial Intelligence Laboratory Technical Report AI 98-276
(see http://www.cs.utexas.edu/users/ai-lab)
Abstract
The recent growth of online information available in the form of natural
language documents creates a greater need for computing systems with the
ability to process those documents to simplify access to the information. One
type of processing appropriate for many tasks is information extraction, a
type of text skimming that retrieves specific types of information from text.
Although information extraction systems have existed for two decades, these
systems have generally been built by hand and contain domain specific
information, making them difficult to port to other domains. A few
researchers have begun to apply machine learning to information extraction
tasks, but most of this work has involved applying learning to pieces of a
much larger system. This dissertation presents a novel rule representation
specific to natural language and a relational learning system, Rapier, which
learns information extraction rules. Rapier takes pairs of documents and
filled templates indicating the information to be extracted and learns
pattern-matching rules to extract fillers for the slots in the template. The
system is tested on several domains, showing its ability to learn rules for
different tasks. Rapier's performance is compared to a propositional learning
system for information extraction, demonstrating the superiority of
relational learning for some information extraction tasks. Because one
difficulty in using machine learning to develop natural language processing
systems is the necessity of providing annotated examples to supervised
learning systems, this dissertation also describes an attempt to reduce the
number of examples Rapier requires by employing a form of active learning.
Experimental results show that the number of examples required to achieve a
given level of performance can be significantly reduced by this method.
BibTeX entry.
Other publications
ILPnet2 librarian,
ilpnet2-lib@cs.bris.ac.uk. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2