
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
Genome scale prediction of protein functional class from sequence using
data mining
R. D. King,
A. Karwath,
A. J. Clare,
and L. Dehaspe.
In R. Ramakrishnan,
S. Stolfo,
R. Bayardo,
and I. Parsa, editors, The Sixth
ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pages 384--389, New York, USA, August 2000. More behind this link.. The Association
for Computing Machinery
Abstract
The ability to predict protein function from amino acid sequence is a central
research goal of molecular biology. Such a capability would greatly aid the
biological interpretation of the genomic data and accelerate its medical
exploitation. For the existing sequenced genomes function can be assigned to
typically only between 40-60% of the genes 1-4 . The new science of
functional genomics is dedicated to discovering the function of these genes,
and to further detailing gene function 5-8 . Here we present a novel
data-mining 9-10 approach to predicting protein functional class from
sequence. We demonstrate the effectiveness of this approach on the tubercle
bacillus 2 genome. Biologically interpretable rules are identified that can
predict protein function even in the absence of identifiable sequence
homology. These rules predict 65% of the genes with no assigned function in
tubercle bacillus with an estimated accuracy of 60- 80% (depending on the
level of functional assignment). The rules give insight into the evolutionary
history of the tubercle bacillus.
BibTeX entry.
Other publications
L Dehaspe,
ldh@cs.kuleuven.ac.be. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2