
[ ILPnet2 | Library | Newsletter | CSCW | Education | End-User Club | Events | Nodes | Systems | Applications | Members only ]
The utility of different representations of protein sequence for predicting
functional class
Ross D. King,
Andreas Karwath,
Amanda Clare,
and Luc Dehaspe.
Bioinformatics, 17(5):445--454, May 2001.
Abstract
Motivation: Data Mining Prediction (DMP) is a novel approach to predicting
protein functional class from sequence. DMP works even in the absence of a
homologous protein of known function. We investigate the utility of different
ways of representing protein sequence in DMP (residue frequencies, phylogeny,
predicted structure) using the Escherichia coli genome as a model.
Results: Using the different representations DMP learnt prediction rules that
were more accurate than default at every level of function using every type
of representation. The most effective way to represent sequence was using
phylogeny (75% accuracy and 13% coverage of unassigned ORFs at the most
general level of function: 69% accuracy and 7% coverage at the most
detailed). We tested different methods for combining predictions from the
different types of representation. These improved both the accuracy and
coverage of predictions, e.g. 40% of all unassigned ORFs could be predicted
at an estimated accuracy of 60% and 5% of unassigned ORFs could be predicted
at an estimated accuracy of 86%.
Availability: The rules and data are
freely available. Warmr is free to academics.
BibTeX entry.
Other publications
L Dehaspe,
ldh@cs.kuleuven.ac.be. Last modified on Wednesday 9 April 2003 at 18:31. © 2003 ILPnet2