Cargando…

Machine learning for protein subcellular localization prediction /

For bioinformaticians, computational biologists, and wet-lab biologists, the authors provide the latest machine learning approaches for protein subcellular localization prediction with a systemic scheme for improving predictors performance.

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Wan, Shibiao (Autor), Mak, M. W. (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Berlin, Germany ; Boston, Massachusetts : De Gruyter, 2015.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Preface
  • Contents
  • List of Abbreviations
  • 1 Introduction
  • 1.1 Proteins and their subcellular locations
  • 1.2 Why computationally predict protein subcellular localization?
  • 1.2.1 Significance of the subcellular localization of proteins
  • 1.2.2 Conventional wet-lab techniques
  • 1.2.3 Computational prediction of protein subcellular localization
  • 1.3 Organization of this book
  • 2 Overview of subcellular localization prediction
  • 2.1 Sequence-based methods
  • 2.1.1 Composition-based methods
  • 2.1.2 Sorting signal-based methods
  • 2.1.3 Homology-based methods
  • 2.2 Knowledge-based methods
  • 2.2.1 GO-term extraction
  • 2.2.2 GO-vector construction
  • 2.3 Limitations of existing methods
  • 2.3.1 Limitations of sequence-based methods
  • 2.3.2 Limitations of knowledge-based methods
  • 3 Legitimacy of using gene ontology information
  • 3.1 Direct table lookup?
  • 3.1.1 Table lookup procedure for single-label prediction
  • 3.1.2 Table-lookup procedure for multi-label prediction
  • 3.1.3 Problems of table lookup
  • 3.2 Using only cellular component GO terms?
  • 3.3 Equivalent to homologous transfer?
  • 3.4 More reasons for using GO information
  • 4 Single-location protein subcellular localization
  • 4.1 Extracting GO from the Gene Ontology Annotation Database
  • 4.1.1 Gene Ontology Annotation Database
  • 4.1.2 Retrieval of GO terms
  • 4.1.3 Construction of GO vectors
  • 4.1.4 Multiclass SVM classification
  • 4.2 FusionSVM: Fusion of gene ontology and homology-based features
  • 4.2.1 InterProGOSVM: Extracting GO from InterProScan
  • 4.2.2 PairProSVM: A homology-based method
  • 4.2.3 Fusion of InterProGOSVM and PairProSVM
  • 4.3 Summary
  • 5 From single- to multi-location
  • 5.1 Significance of multi-location proteins
  • 5.2 Multi-label classification
  • 5.2.1 Algorithm-adaptation methods.
  • 5.2.2 Problem transformation methods
  • 5.2.3 Multi-label classification in bioinformatics
  • 5.3 mGOASVM: A predictor for both single- and multi-location proteins
  • 5.3.1 Feature extraction
  • 5.3.2 Multi-label multiclass SVM classification
  • 5.4 AD-SVM: An adaptive decision multi-label predictor
  • 5.4.1 Multi-label SVM scoring
  • 5.4.2 Adaptive decision for SVM (AD-SVM)
  • 5.4.3 Analysis of AD-SVM
  • 5.5 mPLR-Loc: A multi-label predictor based on penalized logistic regression
  • 5.5.1 Single-label penalized logistic regression
  • 5.5.2 Multi-label penalized logistic regression
  • 5.5.3 Adaptive decision for LR (mPLR-Loc)
  • 5.6 Summary
  • 6 Mining deeper on GO for protein subcellular localization
  • 6.1 Related work
  • 6.2 SS-Loc: Using semantic similarity over GO
  • 6.2.1 Semantic similarity measures
  • 6.2.2 SS vector construction
  • 6.3 HybridGO-Loc: Hybridizing GO frequency and semantic similarity features
  • 6.3.1 Hybridization of two GO features
  • 6.3.2 Multi-label multiclass SVM classification
  • 6.4 Summary
  • 7 Ensemble random projection for large-scale predictions
  • 7.1 Random projection
  • 7.2 RP-SVM: A multi-label classifier with ensemble random projection
  • 7.2.1 Ensemble multi-label classifier
  • 7.2.2 Multi-label classification
  • 7.3 R3P-Loc: A compact predictor based on ridge regression and ensemble random projection
  • 7.3.1 Limitation of using current databases
  • 7.3.2 Creating compact databases
  • 7.3.3 Single-label ridge regression
  • 7.3.4 Multi-label ridge regression
  • 7.4 Summary
  • 8 Experimental setup
  • 8.1 Prediction of single-label proteins
  • 8.1.1 Datasets construction
  • 8.1.2 Performance metrics
  • 8.2 Prediction of multi-label proteins
  • 8.2.1 Dataset construction
  • 8.2.2 Datasets analysis
  • 8.2.3 Performance metrics
  • 8.3 Statistical evaluation methods
  • 8.4 Summary.
  • 9 Results and analysis
  • 9.1 Performance of GOASVM
  • 9.1.1 Comparing GO vector construction methods
  • 9.1.2 Performance of successive-search strategy
  • 9.1.3 Comparing with methods based on other features
  • 9.1.4 Comparing with state-of-the-art GO methods
  • 9.1.5 GOASVM using old GOA databases
  • 9.2 Performance of FusionSVM
  • 9.2.1 Comparing GO vector construction and normalization methods
  • 9.2.2 Performance of PairProSVM
  • 9.2.3 Performance of FusionSVM
  • 9.2.4 Effect of the fusion weights on the performance of FusionSVM
  • 9.3 Performance of mGOASVM
  • 9.3.1 Kernel selection and optimization
  • 9.3.2 Term-frequency for mGOASVM
  • 9.3.3 Multi-label properties for mGOASVM
  • 9.3.4 Further analysis of mGOASVM
  • 9.3.5 Comparing prediction results of novel proteins
  • 9.4 Performance of AD-SVM
  • 9.5 Performance of mPLR-Loc
  • 9.5.1 Effect of adaptive decisions on mPLR-Loc
  • 9.5.2 Effect of regularization on mPLR-Loc
  • 9.6 Performance of HybridGO-Loc
  • 9.6.1 Comparing different features
  • 9.7 Performance of RP-SVM
  • 9.7.1 Performance of ensemble random projection
  • 9.7.2 Comparison with other dimension-reduction methods
  • 9.7.3 Performance of single random-projection
  • 9.7.4 Effect of dimensions and ensemble size
  • 9.8 Performance of R3P-Loc
  • 9.8.1 Performance on the compact databases
  • 9.8.2 Effect of dimensions and ensemble size
  • 9.8.3 Performance of ensemble random projection
  • 9.9 Comprehensive comparison of proposed predictors
  • 9.9.1 Comparison of benchmark datasets
  • 9.9.2 Comparison of novel datasets
  • 9.10 Summary
  • 10 Properties of the proposed predictors
  • 10.1 Noise data in the GOA Database
  • 10.2 Analysis of single-label predictors
  • 10.2.1 GOASVM vs FusionSVM
  • 10.2.2 Can GOASVM be combined with PairProSVM?
  • 10.3 Advantages of mGOASVM
  • 10.3.1 GO-vector construction.
  • 10.3.2 GO subspace selection
  • 10.3.3 Capability of handling multi-label problems
  • 10.4 Analysis for HybridGO-Loc
  • 10.4.1 Semantic similarity measures
  • 10.4.2 GO-frequency features vs SS features
  • 10.4.3 Bias analysis
  • 10.5 Analysis for RP-SVM
  • 10.5.1 Legitimacy of using RP
  • 10.5.2 Ensemble random projection for robust performance
  • 10.6 Comparing the proposed multi-label predictors
  • 10.7 Summary
  • 11 Conclusions and future directions
  • 11.1 Conclusions
  • 11.2 Future directions
  • A Webservers for protein subcellular localization
  • A.1 GOASVM webserver
  • A.2 mGOASVM webserver
  • A.3 HybridGO-Loc webserver
  • A.4 mPLR-Loc webserver
  • B Support vector machines
  • B.1 Binary SVM classification
  • B.2 One-vs-rest SVM classification
  • C Proof of no bias in LOOCV
  • D Derivatives for penalized logistic regression
  • Bibliography
  • Index.