Machine learning for protein subcellular localization prediction /
For bioinformaticians, computational biologists, and wet-lab biologists, the authors provide the latest machine learning approaches for protein subcellular localization prediction with a systemic scheme for improving predictors performance.
Clasificación: | Libro Electrónico |
---|---|
Autores principales: | , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Berlin, Germany ; Boston, Massachusetts :
De Gruyter,
2015.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Preface
- Contents
- List of Abbreviations
- 1 Introduction
- 1.1 Proteins and their subcellular locations
- 1.2 Why computationally predict protein subcellular localization?
- 1.2.1 Significance of the subcellular localization of proteins
- 1.2.2 Conventional wet-lab techniques
- 1.2.3 Computational prediction of protein subcellular localization
- 1.3 Organization of this book
- 2 Overview of subcellular localization prediction
- 2.1 Sequence-based methods
- 2.1.1 Composition-based methods
- 2.1.2 Sorting signal-based methods
- 2.1.3 Homology-based methods
- 2.2 Knowledge-based methods
- 2.2.1 GO-term extraction
- 2.2.2 GO-vector construction
- 2.3 Limitations of existing methods
- 2.3.1 Limitations of sequence-based methods
- 2.3.2 Limitations of knowledge-based methods
- 3 Legitimacy of using gene ontology information
- 3.1 Direct table lookup?
- 3.1.1 Table lookup procedure for single-label prediction
- 3.1.2 Table-lookup procedure for multi-label prediction
- 3.1.3 Problems of table lookup
- 3.2 Using only cellular component GO terms?
- 3.3 Equivalent to homologous transfer?
- 3.4 More reasons for using GO information
- 4 Single-location protein subcellular localization
- 4.1 Extracting GO from the Gene Ontology Annotation Database
- 4.1.1 Gene Ontology Annotation Database
- 4.1.2 Retrieval of GO terms
- 4.1.3 Construction of GO vectors
- 4.1.4 Multiclass SVM classification
- 4.2 FusionSVM: Fusion of gene ontology and homology-based features
- 4.2.1 InterProGOSVM: Extracting GO from InterProScan
- 4.2.2 PairProSVM: A homology-based method
- 4.2.3 Fusion of InterProGOSVM and PairProSVM
- 4.3 Summary
- 5 From single- to multi-location
- 5.1 Significance of multi-location proteins
- 5.2 Multi-label classification
- 5.2.1 Algorithm-adaptation methods.
- 5.2.2 Problem transformation methods
- 5.2.3 Multi-label classification in bioinformatics
- 5.3 mGOASVM: A predictor for both single- and multi-location proteins
- 5.3.1 Feature extraction
- 5.3.2 Multi-label multiclass SVM classification
- 5.4 AD-SVM: An adaptive decision multi-label predictor
- 5.4.1 Multi-label SVM scoring
- 5.4.2 Adaptive decision for SVM (AD-SVM)
- 5.4.3 Analysis of AD-SVM
- 5.5 mPLR-Loc: A multi-label predictor based on penalized logistic regression
- 5.5.1 Single-label penalized logistic regression
- 5.5.2 Multi-label penalized logistic regression
- 5.5.3 Adaptive decision for LR (mPLR-Loc)
- 5.6 Summary
- 6 Mining deeper on GO for protein subcellular localization
- 6.1 Related work
- 6.2 SS-Loc: Using semantic similarity over GO
- 6.2.1 Semantic similarity measures
- 6.2.2 SS vector construction
- 6.3 HybridGO-Loc: Hybridizing GO frequency and semantic similarity features
- 6.3.1 Hybridization of two GO features
- 6.3.2 Multi-label multiclass SVM classification
- 6.4 Summary
- 7 Ensemble random projection for large-scale predictions
- 7.1 Random projection
- 7.2 RP-SVM: A multi-label classifier with ensemble random projection
- 7.2.1 Ensemble multi-label classifier
- 7.2.2 Multi-label classification
- 7.3 R3P-Loc: A compact predictor based on ridge regression and ensemble random projection
- 7.3.1 Limitation of using current databases
- 7.3.2 Creating compact databases
- 7.3.3 Single-label ridge regression
- 7.3.4 Multi-label ridge regression
- 7.4 Summary
- 8 Experimental setup
- 8.1 Prediction of single-label proteins
- 8.1.1 Datasets construction
- 8.1.2 Performance metrics
- 8.2 Prediction of multi-label proteins
- 8.2.1 Dataset construction
- 8.2.2 Datasets analysis
- 8.2.3 Performance metrics
- 8.3 Statistical evaluation methods
- 8.4 Summary.
- 9 Results and analysis
- 9.1 Performance of GOASVM
- 9.1.1 Comparing GO vector construction methods
- 9.1.2 Performance of successive-search strategy
- 9.1.3 Comparing with methods based on other features
- 9.1.4 Comparing with state-of-the-art GO methods
- 9.1.5 GOASVM using old GOA databases
- 9.2 Performance of FusionSVM
- 9.2.1 Comparing GO vector construction and normalization methods
- 9.2.2 Performance of PairProSVM
- 9.2.3 Performance of FusionSVM
- 9.2.4 Effect of the fusion weights on the performance of FusionSVM
- 9.3 Performance of mGOASVM
- 9.3.1 Kernel selection and optimization
- 9.3.2 Term-frequency for mGOASVM
- 9.3.3 Multi-label properties for mGOASVM
- 9.3.4 Further analysis of mGOASVM
- 9.3.5 Comparing prediction results of novel proteins
- 9.4 Performance of AD-SVM
- 9.5 Performance of mPLR-Loc
- 9.5.1 Effect of adaptive decisions on mPLR-Loc
- 9.5.2 Effect of regularization on mPLR-Loc
- 9.6 Performance of HybridGO-Loc
- 9.6.1 Comparing different features
- 9.7 Performance of RP-SVM
- 9.7.1 Performance of ensemble random projection
- 9.7.2 Comparison with other dimension-reduction methods
- 9.7.3 Performance of single random-projection
- 9.7.4 Effect of dimensions and ensemble size
- 9.8 Performance of R3P-Loc
- 9.8.1 Performance on the compact databases
- 9.8.2 Effect of dimensions and ensemble size
- 9.8.3 Performance of ensemble random projection
- 9.9 Comprehensive comparison of proposed predictors
- 9.9.1 Comparison of benchmark datasets
- 9.9.2 Comparison of novel datasets
- 9.10 Summary
- 10 Properties of the proposed predictors
- 10.1 Noise data in the GOA Database
- 10.2 Analysis of single-label predictors
- 10.2.1 GOASVM vs FusionSVM
- 10.2.2 Can GOASVM be combined with PairProSVM?
- 10.3 Advantages of mGOASVM
- 10.3.1 GO-vector construction.
- 10.3.2 GO subspace selection
- 10.3.3 Capability of handling multi-label problems
- 10.4 Analysis for HybridGO-Loc
- 10.4.1 Semantic similarity measures
- 10.4.2 GO-frequency features vs SS features
- 10.4.3 Bias analysis
- 10.5 Analysis for RP-SVM
- 10.5.1 Legitimacy of using RP
- 10.5.2 Ensemble random projection for robust performance
- 10.6 Comparing the proposed multi-label predictors
- 10.7 Summary
- 11 Conclusions and future directions
- 11.1 Conclusions
- 11.2 Future directions
- A Webservers for protein subcellular localization
- A.1 GOASVM webserver
- A.2 mGOASVM webserver
- A.3 HybridGO-Loc webserver
- A.4 mPLR-Loc webserver
- B Support vector machines
- B.1 Binary SVM classification
- B.2 One-vs-rest SVM classification
- C Proof of no bias in LOOCV
- D Derivatives for penalized logistic regression
- Bibliography
- Index.