Logo[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]

Finding frequent substructures in chemical compounds

L. Dehaspe, H. Toivonen, and R. D. King. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, editors, 4th International Conference on Knowledge Discovery and Data Mining, pages 30--36. AAAI Press., August 1998. More behind this link.

Abstract

The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem is descriptive rather than based on classification; the goal being to find common substructures and properties in chemical compounds, and in this way to contribute to scientific insight. This approach contrasts with previous machine learning research on this problem, which has mainly concentrated on predicting the toxicity of unknown chemicals. Our contribution to the field of data mining is the ability to discover useful frequent patterns that are beyond the complexity of association rules or their known variants. This is vital to the problem, which requires the discovery of patterns that are out of the reach of simple transformations to frequent itemsets. We present a knowledge discovery method for structured data, where patterns reflect the one-to-many and many-to-many relationships of several tables. Background knowledge, represented in a uniform manner in some of the tables, has an essential role here, unlike in most data mining settings for the discovery of frequent patterns.

BibTeX entry.

Other publications


L Dehaspe, ldh@cs.kuleuven.ac.be. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2