Introduction to statistical machine learning /
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Waltham, MA :
Morgan Kaufmann,
2016.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Front Cover
- Introduction to Statistical Machine Learning
- Copyright
- Table of Contents
- Biography
- Preface
- 1 INTRODUCTION
- 1 Statistical Machine Learning
- 1.1 Types of Learning
- 1.2 Examples of Machine Learning Tasks
- 1.2.1 Supervised Learning
- 1.2.2 Unsupervised Learning
- 1.2.3 Further Topics
- 1.3 Structure of This Textbook
- 2 STATISTICS AND PROBABILITY
- 2 Random Variables and Probability Distributions
- 2.1 Mathematical Preliminaries
- 2.2 Probability
- 2.3 Random Variable and Probability Distribution
- 2.4 Properties of Probability Distributions
- 2.4.1 Expectation, Median, and Mode
- 2.4.2 Variance and Standard Deviation
- 2.4.3 Skewness, Kurtosis, and Moments
- 2.5 Transformation of Random Variables
- 3 Examples of Discrete Probability Distributions
- 3.1 Discrete Uniform Distribution
- 3.2 Binomial Distribution
- 3.3 Hypergeometric Distribution
- 3.4 Poisson Distribution
- 3.5 Negative Binomial Distribution
- 3.6 Geometric Distribution
- 4 Examples of Continuous Probability Distributions
- 4.1 Continuous Uniform Distribution
- 4.2 Normal Distribution
- 4.3 Gamma Distribution, Exponential Distribution, and Chi-Squared Distribution
- 4.4 Beta Distribution
- 4.5 Cauchy Distribution and Laplace Distribution
- 4.6 t-Distribution and F-Distribution
- 5 Multidimensional Probability Distributions
- 5.1 Joint Probability Distribution
- 5.2 Conditional Probability Distribution
- 5.3 Contingency Table
- 5.4 Bayes' Theorem
- 5.5 Covariance and Correlation
- 5.6 Independence
- 6 Examples of Multidimensional Probability Distributions
- 6.1 Multinomial Distribution
- 6.2 Multivariate Normal Distribution
- 6.3 Dirichlet Distribution
- 6.4 Wishart Distribution
- 7 Sum of Independent Random Variables
- 7.1 Convolution
- 7.2 Reproductive Property
- 7.3 Law of Large Numbers.
- 7.4 Central Limit Theorem
- 8 Probability Inequalities
- 8.1 Union Bound
- 8.2 Inequalities for Probabilities
- 8.2.1 Markov's Inequality and Chernoff's Inequality
- 8.2.2 Cantelli's Inequality and Chebyshev's Inequality
- 8.3 Inequalities for Expectation
- 8.3.1 Jensen's Inequality
- 8.3.2 �Hlder's Inequality and Schwarz's Inequality
- 8.3.3 Minkowski's Inequality
- 8.3.4 Kantorovich's Inequality
- 8.4 Inequalities for the Sum of Independent Random Variables
- 8.4.1 Chebyshev's Inequality and Chernoff's Inequality
- 8.4.2 Hoeffding's Inequality and Bernstein's Inequality
- 8.4.3 Bennett's Inequality
- 9 Statistical Estimation
- 9.1 Fundamentals of Statistical Estimation
- 9.2 Point Estimation
- 9.2.1 Parametric Density Estimation
- 9.2.2 Nonparametric Density Estimation
- 9.2.3 Regression and Classification
- 9.2.4 Model Selection
- 9.3 Interval Estimation
- 9.3.1 Interval Estimation for Expectation of Normal Samples
- 9.3.2 Bootstrap Confidence Interval
- 9.3.3 Bayesian Credible Interval
- 10 Hypothesis Testing
- 10.1 Fundamentals of Hypothesis Testing
- 10.2 Test for Expectation of Normal Samples
- 10.3 Neyman-Pearson Lemma
- 10.4 Test for Contingency Tables
- 10.5 Test for Difference in Expectations of Normal Samples
- 10.5.1 Two Samples without Correspondence
- 10.5.2 Two Samples with Correspondence
- 10.6 Nonparametric Test for Ranks
- 10.6.1 Two Samples without Correspondence
- 10.6.2 Two Samples with Correspondence
- 10.7 Monte Carlo Test
- 3 GENERATIVE APPROACH TO STATISTICAL PATTERN RECOGNITION
- 11 Pattern Recognition via Generative Model Estimation
- 11.1 Formulation of Pattern Recognition
- 11.2 Statistical Pattern Recognition
- 11.3 Criteria for Classifier Training
- 11.3.1 MAP Rule
- 11.3.2 Minimum Misclassification Rate Rule
- 11.3.3 Bayes Decision Rule
- 11.3.4 Discussion.
- 11.4 Generative and Discriminative Approaches
- 12 Maximum Likelihood Estimation
- 12.1 Definition
- 12.2 Gaussian Model
- 12.3 Computing the Class-Posterior Probability
- 12.4 Fisher's Linear Discriminant Analysis (FDA)
- 12.5 Hand-Written Digit Recognition
- 12.5.1 Preparation
- 12.5.2 Implementing Linear Discriminant Analysis
- 12.5.3 Multiclass Classification
- 13 Properties of Maximum Likelihood Estimation
- 13.1 Consistency
- 13.2 Asymptotic Unbiasedness
- 13.3 Asymptotic Efficiency
- 13.3.1 One-Dimensional Case
- 13.3.2 Multidimensional Cases
- 13.4 Asymptotic Normality
- 13.5 Summary
- 14 Model Selection for Maximum Likelihood Estimation
- 14.1 Model Selection
- 14.2 KL Divergence
- 14.3 AIC
- 14.4 Cross Validation
- 14.5 Discussion
- 15 Maximum Likelihood Estimation for Gaussian Mixture Model
- 15.1 Gaussian Mixture Model
- 15.2 MLE
- 15.3 Gradient Ascent Algorithm
- 15.4 EM Algorithm
- 16 Nonparametric Estimation
- 16.1 Histogram Method
- 16.2 Problem Formulation
- 16.3 KDE
- 16.3.1 Parzen Window Method
- 16.3.2 Smoothing with Kernels
- 16.3.3 Bandwidth Selection
- 16.4 NNDE
- 16.4.1 Nearest Neighbor Distance
- 16.4.2 Nearest Neighbor Classifier
- 17 Bayesian Inference
- 17.1 Bayesian Predictive Distribution
- 17.1.1 Definition
- 17.1.2 Comparison with MLE
- 17.1.3 Computational Issues
- 17.2 Conjugate Prior
- 17.3 MAP Estimation
- 17.4 Bayesian Model Selection
- 18 Analytic Approximation of Marginal Likelihood
- 18.1 Laplace Approximation
- 18.1.1 Approximation with Gaussian Density
- 18.1.2 Illustration
- 18.1.3 Application to Marginal Likelihood Approximation
- 18.1.4 Bayesian Information Criterion (BIC)
- 18.2 Variational Approximation
- 18.2.1 Variational Bayesian EM (VBEM) Algorithm
- 18.2.2 Relation to Ordinary EM Algorithm
- 19 Numerical Approximation of Predictive Distribution.
- 19.1 Monte Carlo Integration
- 19.2 Importance Sampling
- 19.3 Sampling Algorithms
- 19.3.1 Inverse Transform Sampling
- 19.3.2 Rejection Sampling
- 19.3.3 Markov Chain Monte Carlo (MCMC) Method
- 20 Bayesian Mixture Models
- 20.1 Gaussian Mixture Models
- 20.1.1 Bayesian Formulation
- 20.1.2 Variational Inference
- 20.1.3 Gibbs Sampling
- 20.2 Latent Dirichlet Allocation (LDA)
- 20.2.1 Topic Models
- 20.2.2 Bayesian Formulation
- 20.2.3 Gibbs Sampling
- 4 DISCRIMINATIVE APPROACH TO STATISTICAL MACHINE LEARNING
- 21 Learning Models
- 21.1 Linear-in-Parameter Model
- 21.2 Kernel Model
- 21.3 Hierarchical Model
- 22 Least Squares Regression
- 22.1 Method of LS
- 22.2 Solution for Linear-in-Parameter Model
- 22.3 Properties of LS Solution
- 22.4 Learning Algorithm for Large-Scale Data
- 22.5 Learning Algorithm for Hierarchical Model
- 23 Constrained LS Regression
- 23.1 Subspace-Constrained LS
- 23.2?2-Constrained LS
- 23.3 Model Selection
- 24 Sparse Regression
- 24.1?1-Constrained LS
- 24.2 Solving?1-Constrained LS
- 24.3 Feature Selection by Sparse Learning
- 24.4 Various Extensions
- 24.4.1 Generalized?1-Constrained LS
- 24.4.2?p-Constrained LS
- 24.4.3?1+?2-Constrained LS
- 24.4.4?1,2-Constrained LS
- 24.4.5 Trace Norm Constrained LS
- 25 Robust Regression
- 25.1 Nonrobustness of?2-Loss Minimization
- 25.2?1-Loss Minimization
- 25.3 Huber Loss Minimization
- 25.3.1 Definition
- 25.3.2 Stochastic Gradient Algorithm
- 25.3.3 Iteratively Reweighted LS
- 25.3.4?1-Constrained Huber Loss Minimization
- 25.4 Tukey Loss Minimization
- 26 Least Squares Classification
- 26.1 Classification by LS Regression
- 26.2 0/1-Loss and Margin
- 26.3 Multiclass Classification
- 27 Support Vector Classification
- 27.1 Maximum Margin Classification
- 27.1.1 Hard Margin Support Vector Classification.
- 27.1.2 Soft Margin Support Vector Classification
- 27.2 Dual Optimization of Support Vector Classification
- 27.3 Sparseness of Dual Solution
- 27.4 Nonlinearization by Kernel Trick
- 27.5 Multiclass Extension
- 27.6 Loss Minimization View
- 27.6.1 Hinge Loss Minimization
- 27.6.2 Squared Hinge Loss Minimization
- 27.6.3 Ramp Loss Minimization
- 28 Probabilistic Classification
- 28.1 Logistic Regression
- 28.1.1 Logistic Model and MLE
- 28.1.2 Loss Minimization View
- 28.2 LS Probabilistic Classification
- 29 Structured Classification
- 29.1 Sequence Classification
- 29.2 Probabilistic Classification for Sequences
- 29.2.1 Conditional Random Field
- 29.2.2 MLE
- 29.2.3 Recursive Computation
- 29.2.4 Prediction for New Sample
- 29.3 Deterministic Classification for Sequences
- 5 FURTHER TOPICS
- 30 Ensemble Learning
- 30.1 Decision Stump Classifier
- 30.2 Bagging
- 30.3 Boosting
- 30.3.1 Adaboost
- 30.3.2 Loss Minimization View
- 30.4 General Ensemble Learning
- 31 Online Learning
- 31.1 Stochastic Gradient Descent
- 31.2 Passive-Aggressive Learning
- 31.2.1 Classification
- 31.2.2 Regression
- 31.3 Adaptive Regularization of Weight Vectors (AROW)
- 31.3.1 Uncertainty of Parameters
- 31.3.2 Classification
- 31.3.3 Regression
- 32 Confidence of Prediction
- 32.1 Predictive Variance for?2-Regularized LS
- 32.2 Bootstrap Confidence Estimation
- 32.3 Applications
- 32.3.1 Time-series Prediction
- 32.3.2 Tuning Parameter Optimization
- 33 Semisupervised Learning
- 33.1 Manifold Regularization
- 33.1.1 Manifold Structure Brought by Input Samples
- 33.1.2 Computing the Solution
- 33.2 Covariate Shift Adaptation
- 33.2.1 Importance Weighted Learning
- 33.2.2 Relative Importance Weighted Learning
- 33.2.3 Importance Weighted Cross Validation
- 33.2.4 Importance Estimation.