
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Finding frequent substructures in chemical compounds
L. Dehaspe,
H. Toivonen,
and R. D. King.
In R. Agrawal,
P. Stolorz,
and G. Piatetsky-Shapiro, editors, 4th
International Conference on Knowledge Discovery and Data Mining, pages
30--36. AAAI Press., August 1998. More behind this link.
Abstract
The discovery of the relationships between chemical structure and biological
function is central to biological science and medicine. In this paper we
apply data mining to the problem of predicting chemical carcinogenicity. This
toxicology application was launched at IJCAI'97 as a research challenge for
artificial intelligence. Our approach to the problem is descriptive rather
than based on classification; the goal being to find common substructures and
properties in chemical compounds, and in this way to contribute to scientific
insight. This approach contrasts with previous machine learning research on
this problem, which has mainly concentrated on predicting the toxicity of
unknown chemicals. Our contribution to the field of data mining is the
ability to discover useful frequent patterns that are beyond the complexity
of association rules or their known variants. This is vital to the problem,
which requires the discovery of patterns that are out of the reach of simple
transformations to frequent itemsets. We present a knowledge discovery method
for structured data, where patterns reflect the one-to-many and many-to-many
relationships of several tables. Background knowledge, represented in a
uniform manner in some of the tables, has an essential role here, unlike in
most data mining settings for the discovery of frequent patterns.
BibTeX entry.
Other publications
L Dehaspe,
ldh@cs.kuleuven.ac.be. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2