Cargando…

Machine learning for protein subcellular localization prediction /

For bioinformaticians, computational biologists, and wet-lab biologists, the authors provide the latest machine learning approaches for protein subcellular localization prediction with a systemic scheme for improving predictors performance.

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autores principales:	Wan, Shibiao (Autor), Mak, M. W. (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Berlin, Germany ; Boston, Massachusetts : De Gruyter, 2015.
Temas:	Proteins > Physiological transport > Data processing. Machine learning. Probabilities > Data processing. Carrier proteins. Artificial intelligence. Probabilities. Carrier Proteins Artificial Intelligence Probability Machine Learning Protéines > Transport physiologique > Informatique. Apprentissage automatique. Probabilités > Informatique. Protéines de liaison. Intelligence artificielle. Probabilités. artificial intelligence. probability. Technology & Engineering > Signals & Signal Processing. Probabilities Carrier proteins Artificial intelligence Machine learning Probabilities > Data processing Bioinformatics. Computer Science. Proteomics.
Acceso en línea:	Texto completo

Tabla de Contenidos:

Preface
Contents
List of Abbreviations
1 Introduction
1.1 Proteins and their subcellular locations
1.2 Why computationally predict protein subcellular localization?
1.2.1 Significance of the subcellular localization of proteins
1.2.2 Conventional wet-lab techniques
1.2.3 Computational prediction of protein subcellular localization
1.3 Organization of this book
2 Overview of subcellular localization prediction
2.1 Sequence-based methods
2.1.1 Composition-based methods
2.1.2 Sorting signal-based methods
2.1.3 Homology-based methods
2.2 Knowledge-based methods
2.2.1 GO-term extraction
2.2.2 GO-vector construction
2.3 Limitations of existing methods
2.3.1 Limitations of sequence-based methods
2.3.2 Limitations of knowledge-based methods
3 Legitimacy of using gene ontology information
3.1 Direct table lookup?
3.1.1 Table lookup procedure for single-label prediction
3.1.2 Table-lookup procedure for multi-label prediction
3.1.3 Problems of table lookup
3.2 Using only cellular component GO terms?
3.3 Equivalent to homologous transfer?
3.4 More reasons for using GO information
4 Single-location protein subcellular localization
4.1 Extracting GO from the Gene Ontology Annotation Database
4.1.1 Gene Ontology Annotation Database
4.1.2 Retrieval of GO terms
4.1.3 Construction of GO vectors
4.1.4 Multiclass SVM classification
4.2 FusionSVM: Fusion of gene ontology and homology-based features
4.2.1 InterProGOSVM: Extracting GO from InterProScan
4.2.2 PairProSVM: A homology-based method
4.2.3 Fusion of InterProGOSVM and PairProSVM
4.3 Summary
5 From single- to multi-location
5.1 Significance of multi-location proteins
5.2 Multi-label classification
5.2.1 Algorithm-adaptation methods.
5.2.2 Problem transformation methods
5.2.3 Multi-label classification in bioinformatics
5.3 mGOASVM: A predictor for both single- and multi-location proteins
5.3.1 Feature extraction
5.3.2 Multi-label multiclass SVM classification
5.4 AD-SVM: An adaptive decision multi-label predictor
5.4.1 Multi-label SVM scoring
5.4.2 Adaptive decision for SVM (AD-SVM)
5.4.3 Analysis of AD-SVM
5.5 mPLR-Loc: A multi-label predictor based on penalized logistic regression
5.5.1 Single-label penalized logistic regression
5.5.2 Multi-label penalized logistic regression
5.5.3 Adaptive decision for LR (mPLR-Loc)
5.6 Summary
6 Mining deeper on GO for protein subcellular localization
6.1 Related work
6.2 SS-Loc: Using semantic similarity over GO
6.2.1 Semantic similarity measures
6.2.2 SS vector construction
6.3 HybridGO-Loc: Hybridizing GO frequency and semantic similarity features
6.3.1 Hybridization of two GO features
6.3.2 Multi-label multiclass SVM classification
6.4 Summary
7 Ensemble random projection for large-scale predictions
7.1 Random projection
7.2 RP-SVM: A multi-label classifier with ensemble random projection
7.2.1 Ensemble multi-label classifier
7.2.2 Multi-label classification
7.3 R3P-Loc: A compact predictor based on ridge regression and ensemble random projection
7.3.1 Limitation of using current databases
7.3.2 Creating compact databases
7.3.3 Single-label ridge regression
7.3.4 Multi-label ridge regression
7.4 Summary
8 Experimental setup
8.1 Prediction of single-label proteins
8.1.1 Datasets construction
8.1.2 Performance metrics
8.2 Prediction of multi-label proteins
8.2.1 Dataset construction
8.2.2 Datasets analysis
8.2.3 Performance metrics
8.3 Statistical evaluation methods
8.4 Summary.
9 Results and analysis
9.1 Performance of GOASVM
9.1.1 Comparing GO vector construction methods
9.1.2 Performance of successive-search strategy
9.1.3 Comparing with methods based on other features
9.1.4 Comparing with state-of-the-art GO methods
9.1.5 GOASVM using old GOA databases
9.2 Performance of FusionSVM
9.2.1 Comparing GO vector construction and normalization methods
9.2.2 Performance of PairProSVM
9.2.3 Performance of FusionSVM
9.2.4 Effect of the fusion weights on the performance of FusionSVM
9.3 Performance of mGOASVM
9.3.1 Kernel selection and optimization
9.3.2 Term-frequency for mGOASVM
9.3.3 Multi-label properties for mGOASVM
9.3.4 Further analysis of mGOASVM
9.3.5 Comparing prediction results of novel proteins
9.4 Performance of AD-SVM
9.5 Performance of mPLR-Loc
9.5.1 Effect of adaptive decisions on mPLR-Loc
9.5.2 Effect of regularization on mPLR-Loc
9.6 Performance of HybridGO-Loc
9.6.1 Comparing different features
9.7 Performance of RP-SVM
9.7.1 Performance of ensemble random projection
9.7.2 Comparison with other dimension-reduction methods
9.7.3 Performance of single random-projection
9.7.4 Effect of dimensions and ensemble size
9.8 Performance of R3P-Loc
9.8.1 Performance on the compact databases
9.8.2 Effect of dimensions and ensemble size
9.8.3 Performance of ensemble random projection
9.9 Comprehensive comparison of proposed predictors
9.9.1 Comparison of benchmark datasets
9.9.2 Comparison of novel datasets
9.10 Summary
10 Properties of the proposed predictors
10.1 Noise data in the GOA Database
10.2 Analysis of single-label predictors
10.2.1 GOASVM vs FusionSVM
10.2.2 Can GOASVM be combined with PairProSVM?
10.3 Advantages of mGOASVM
10.3.1 GO-vector construction.
10.3.2 GO subspace selection
10.3.3 Capability of handling multi-label problems
10.4 Analysis for HybridGO-Loc
10.4.1 Semantic similarity measures
10.4.2 GO-frequency features vs SS features
10.4.3 Bias analysis
10.5 Analysis for RP-SVM
10.5.1 Legitimacy of using RP
10.5.2 Ensemble random projection for robust performance
10.6 Comparing the proposed multi-label predictors
10.7 Summary
11 Conclusions and future directions
11.1 Conclusions
11.2 Future directions
A Webservers for protein subcellular localization
A.1 GOASVM webserver
A.2 mGOASVM webserver
A.3 HybridGO-Loc webserver
A.4 mPLR-Loc webserver
B Support vector machines
B.1 Binary SVM classification
B.2 One-vs-rest SVM classification
C Proof of no bias in LOOCV
D Derivatives for penalized logistic regression
Bibliography
Index.

Machine learning for protein subcellular localization prediction /

Ejemplares similares