
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Mining for Causes of Cancer: Machine Learning Experiments at Various Levels
of Detail
S. Kramer,
B. Pfahringer,
and C. Helma.
In Proceedings of the Third International Conference on Knowledge Discovery
and Data Mining (KDD-97), Menlo Park, CA, 1997. More behind this link.. AAAI Press
Abstract
This paper presents, from a methodological point of view, first results of an
interdisciplinary project in scientific data mining. We analyze data about
the carcinogenicity of chemicals derived from the carcinogenesis bioassay
program, a long-term research study performed by the US National Institute of
Environmental Health Sciences. The database contains detailed descriptions of
6823 tests performed with more than 330 compounds and animals of different
species, strains and sexes. The chemical structures are described at the atom
and bond level, and in terms of various relevant structural properties. The
goal of this paper is to investigate the effects that various levels of
detail and amounts of information have on the resulting hypotheses, both
quantitatively and qualitatively. We apply relational and propositional
machine learning algorithms to learning problems formulated as regression or
as classification tasks. In addition, these experiments have been conducted
with two learning problems which are at different levels of detail.
Quantitatively, our experiments indicate that additional information not
necessarily improves accuracy. Qualitatively, a number of potential
discoveries have been made by the algorithm for Relational Regression,
because it is not forced to abstract from the details contained in the
relations of the database.
BibTeX entry.
Other publications
S Kramer,
stefan@ai.univie.ac.at,
B Pfahringer,
bernhard@ai.univie.ac.at. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2