Sumario: | The development of effective methods for the prediction of ontological annotations is an important goal in computational biology, yet evaluating their performance is difficult due to problems caused by the structure of biomedical ontologies and incomplete annotations of genes. This work proposes an information-theoretic framework to evaluate the performance of computational protein function prediction. A Bayesian network is used, structured according to the underlying ontology, to model the prior probability of a protein's function. The concepts of misinformation and remaining uncertainty are then defined, that can be seen as analogs of precision and recall. Finally, semantic distance is proposed as a single statistic for ranking classification models. The approach is evaluated by analyzing three protein function predictors of gene ontology terms. The work addresses several weaknesses of current metrics, and provides valuable insights into the performance of protein function prediction tools.
|