Logo[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]

Genome scale prediction of protein functional class from sequence using data mining

R. D. King, A. Karwath, A. J. Clare, and L. Dehaspe. In R. Ramakrishnan, S. Stolfo, R. Bayardo, and I. Parsa, editors, The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 384--389, New York, USA, August 2000. More behind this link.. The Association for Computing Machinery

Abstract

The ability to predict protein function from amino acid sequence is a central research goal of molecular biology. Such a capability would greatly aid the biological interpretation of the genomic data and accelerate its medical exploitation. For the existing sequenced genomes function can be assigned to typically only between 40-60% of the genes 1-4 . The new science of functional genomics is dedicated to discovering the function of these genes, and to further detailing gene function 5-8 . Here we present a novel data-mining 9-10 approach to predicting protein functional class from sequence. We demonstrate the effectiveness of this approach on the tubercle bacillus 2 genome. Biologically interpretable rules are identified that can predict protein function even in the absence of identifiable sequence homology. These rules predict 65% of the genes with no assigned function in tubercle bacillus with an estimated accuracy of 60- 80% (depending on the level of functional assignment). The rules give insight into the evolutionary history of the tubercle bacillus.

BibTeX entry.

Other publications


L Dehaspe, ldh@cs.kuleuven.ac.be. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2