
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Context-Sensitive Models in Inductive Logic Programming
Ashwin Srinivasan.
Machine Learning, 43(3):301--324, September 2001.
Abstract
Given domain-specific background knowledge and data in the form of examples, an
Inductive Logic Programming (ILP) system extracts models in the data-analytic
sense. We view the model-selection step facing an ILP system as a decision
problem, the solution of which requires knowledge of the context in which the
model is to be deployed. In this paper, ``context'' will be defined by the
current specification of the prior class distribution and the client's
preferences concerning errors of classification. Within this restricted
setting, we consider the use of an ILP system in situations where: (a)
contexts can change regularly. This can arise for example, from changes to
class distributions or misclassification costs; and (b) the data are from
observational studies. That is, they may not have been collected with any
particular context in mind. Some repercussions of these are: (a) any one
model may not be the optimal choice for all contexts; and (b) not all the
background information provided may be relevant for all contexts. Using
results from the analysis of Receiver Operating Characteristic curves, we
investigate a technique that can equip an ILP system to reject those models
that cannot possibly be optimal in any context. We present empirical results
from using the technique to analyse two datasets concerned with the toxicity
of chemicals (in particular, their mutagenic and carcinogenic properties).
Clients can, and typically do, approach such datasets with quite different
requirements. For example, a synthetic chemist would require models with a
low rate of commission errors which could be used to direct efficiently the
synthesis of new compounds. A toxicologist on the other hand, would prefer
models with a low rate of omission errors. This would enable a more complete
identification of toxic chemicals at a calculated cost of misidentification
of non-toxic cases as toxic. The approach adopted here attempts to obtain a
solution that contains models that are optimal for each such user according
to the cost function that he or she wishes to apply. In doing so, it also
provides one solution to the problem of how the relevance of background
predicates is to be assessed in ILP.
BibTeX entry.
Other publications
A Srinivasan,
Ashwin.Srinivasan@comlab.ox.ac.uk. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2