Logo[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]

Mining for Causes of Cancer: Machine Learning Experiments at Various Levels of Detail

S. Kramer, B. Pfahringer, and C. Helma. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Menlo Park, CA, 1997. More behind this link.. AAAI Press

Abstract

This paper presents, from a methodological point of view, first results of an interdisciplinary project in scientific data mining. We analyze data about the carcinogenicity of chemicals derived from the carcinogenesis bioassay program, a long-term research study performed by the US National Institute of Environmental Health Sciences. The database contains detailed descriptions of 6823 tests performed with more than 330 compounds and animals of different species, strains and sexes. The chemical structures are described at the atom and bond level, and in terms of various relevant structural properties. The goal of this paper is to investigate the effects that various levels of detail and amounts of information have on the resulting hypotheses, both quantitatively and qualitatively. We apply relational and propositional machine learning algorithms to learning problems formulated as regression or as classification tasks. In addition, these experiments have been conducted with two learning problems which are at different levels of detail. Quantitatively, our experiments indicate that additional information not necessarily improves accuracy. Qualitatively, a number of potential discoveries have been made by the algorithm for Relational Regression, because it is not forced to abstract from the details contained in the relations of the database.

BibTeX entry.

Other publications


S Kramer, stefan@ai.univie.ac.at,
B Pfahringer, bernhard@ai.univie.ac.at. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2