Cargando…

Introduction to statistical machine learning /

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Sugiyama, Masashi (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Waltham, MA : Morgan Kaufmann, 2016.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Front Cover
  • Introduction to Statistical Machine Learning
  • Copyright
  • Table of Contents
  • Biography
  • Preface
  • 1 INTRODUCTION
  • 1 Statistical Machine Learning
  • 1.1 Types of Learning
  • 1.2 Examples of Machine Learning Tasks
  • 1.2.1 Supervised Learning
  • 1.2.2 Unsupervised Learning
  • 1.2.3 Further Topics
  • 1.3 Structure of This Textbook
  • 2 STATISTICS AND PROBABILITY
  • 2 Random Variables and Probability Distributions
  • 2.1 Mathematical Preliminaries
  • 2.2 Probability
  • 2.3 Random Variable and Probability Distribution
  • 2.4 Properties of Probability Distributions
  • 2.4.1 Expectation, Median, and Mode
  • 2.4.2 Variance and Standard Deviation
  • 2.4.3 Skewness, Kurtosis, and Moments
  • 2.5 Transformation of Random Variables
  • 3 Examples of Discrete Probability Distributions
  • 3.1 Discrete Uniform Distribution
  • 3.2 Binomial Distribution
  • 3.3 Hypergeometric Distribution
  • 3.4 Poisson Distribution
  • 3.5 Negative Binomial Distribution
  • 3.6 Geometric Distribution
  • 4 Examples of Continuous Probability Distributions
  • 4.1 Continuous Uniform Distribution
  • 4.2 Normal Distribution
  • 4.3 Gamma Distribution, Exponential Distribution, and Chi-Squared Distribution
  • 4.4 Beta Distribution
  • 4.5 Cauchy Distribution and Laplace Distribution
  • 4.6 t-Distribution and F-Distribution
  • 5 Multidimensional Probability Distributions
  • 5.1 Joint Probability Distribution
  • 5.2 Conditional Probability Distribution
  • 5.3 Contingency Table
  • 5.4 Bayes' Theorem
  • 5.5 Covariance and Correlation
  • 5.6 Independence
  • 6 Examples of Multidimensional Probability Distributions
  • 6.1 Multinomial Distribution
  • 6.2 Multivariate Normal Distribution
  • 6.3 Dirichlet Distribution
  • 6.4 Wishart Distribution
  • 7 Sum of Independent Random Variables
  • 7.1 Convolution
  • 7.2 Reproductive Property
  • 7.3 Law of Large Numbers.
  • 7.4 Central Limit Theorem
  • 8 Probability Inequalities
  • 8.1 Union Bound
  • 8.2 Inequalities for Probabilities
  • 8.2.1 Markov's Inequality and Chernoff's Inequality
  • 8.2.2 Cantelli's Inequality and Chebyshev's Inequality
  • 8.3 Inequalities for Expectation
  • 8.3.1 Jensen's Inequality
  • 8.3.2 �Hlder's Inequality and Schwarz's Inequality
  • 8.3.3 Minkowski's Inequality
  • 8.3.4 Kantorovich's Inequality
  • 8.4 Inequalities for the Sum of Independent Random Variables
  • 8.4.1 Chebyshev's Inequality and Chernoff's Inequality
  • 8.4.2 Hoeffding's Inequality and Bernstein's Inequality
  • 8.4.3 Bennett's Inequality
  • 9 Statistical Estimation
  • 9.1 Fundamentals of Statistical Estimation
  • 9.2 Point Estimation
  • 9.2.1 Parametric Density Estimation
  • 9.2.2 Nonparametric Density Estimation
  • 9.2.3 Regression and Classification
  • 9.2.4 Model Selection
  • 9.3 Interval Estimation
  • 9.3.1 Interval Estimation for Expectation of Normal Samples
  • 9.3.2 Bootstrap Confidence Interval
  • 9.3.3 Bayesian Credible Interval
  • 10 Hypothesis Testing
  • 10.1 Fundamentals of Hypothesis Testing
  • 10.2 Test for Expectation of Normal Samples
  • 10.3 Neyman-Pearson Lemma
  • 10.4 Test for Contingency Tables
  • 10.5 Test for Difference in Expectations of Normal Samples
  • 10.5.1 Two Samples without Correspondence
  • 10.5.2 Two Samples with Correspondence
  • 10.6 Nonparametric Test for Ranks
  • 10.6.1 Two Samples without Correspondence
  • 10.6.2 Two Samples with Correspondence
  • 10.7 Monte Carlo Test
  • 3 GENERATIVE APPROACH TO STATISTICAL PATTERN RECOGNITION
  • 11 Pattern Recognition via Generative Model Estimation
  • 11.1 Formulation of Pattern Recognition
  • 11.2 Statistical Pattern Recognition
  • 11.3 Criteria for Classifier Training
  • 11.3.1 MAP Rule
  • 11.3.2 Minimum Misclassification Rate Rule
  • 11.3.3 Bayes Decision Rule
  • 11.3.4 Discussion.
  • 11.4 Generative and Discriminative Approaches
  • 12 Maximum Likelihood Estimation
  • 12.1 Definition
  • 12.2 Gaussian Model
  • 12.3 Computing the Class-Posterior Probability
  • 12.4 Fisher's Linear Discriminant Analysis (FDA)
  • 12.5 Hand-Written Digit Recognition
  • 12.5.1 Preparation
  • 12.5.2 Implementing Linear Discriminant Analysis
  • 12.5.3 Multiclass Classification
  • 13 Properties of Maximum Likelihood Estimation
  • 13.1 Consistency
  • 13.2 Asymptotic Unbiasedness
  • 13.3 Asymptotic Efficiency
  • 13.3.1 One-Dimensional Case
  • 13.3.2 Multidimensional Cases
  • 13.4 Asymptotic Normality
  • 13.5 Summary
  • 14 Model Selection for Maximum Likelihood Estimation
  • 14.1 Model Selection
  • 14.2 KL Divergence
  • 14.3 AIC
  • 14.4 Cross Validation
  • 14.5 Discussion
  • 15 Maximum Likelihood Estimation for Gaussian Mixture Model
  • 15.1 Gaussian Mixture Model
  • 15.2 MLE
  • 15.3 Gradient Ascent Algorithm
  • 15.4 EM Algorithm
  • 16 Nonparametric Estimation
  • 16.1 Histogram Method
  • 16.2 Problem Formulation
  • 16.3 KDE
  • 16.3.1 Parzen Window Method
  • 16.3.2 Smoothing with Kernels
  • 16.3.3 Bandwidth Selection
  • 16.4 NNDE
  • 16.4.1 Nearest Neighbor Distance
  • 16.4.2 Nearest Neighbor Classifier
  • 17 Bayesian Inference
  • 17.1 Bayesian Predictive Distribution
  • 17.1.1 Definition
  • 17.1.2 Comparison with MLE
  • 17.1.3 Computational Issues
  • 17.2 Conjugate Prior
  • 17.3 MAP Estimation
  • 17.4 Bayesian Model Selection
  • 18 Analytic Approximation of Marginal Likelihood
  • 18.1 Laplace Approximation
  • 18.1.1 Approximation with Gaussian Density
  • 18.1.2 Illustration
  • 18.1.3 Application to Marginal Likelihood Approximation
  • 18.1.4 Bayesian Information Criterion (BIC)
  • 18.2 Variational Approximation
  • 18.2.1 Variational Bayesian EM (VBEM) Algorithm
  • 18.2.2 Relation to Ordinary EM Algorithm
  • 19 Numerical Approximation of Predictive Distribution.
  • 19.1 Monte Carlo Integration
  • 19.2 Importance Sampling
  • 19.3 Sampling Algorithms
  • 19.3.1 Inverse Transform Sampling
  • 19.3.2 Rejection Sampling
  • 19.3.3 Markov Chain Monte Carlo (MCMC) Method
  • 20 Bayesian Mixture Models
  • 20.1 Gaussian Mixture Models
  • 20.1.1 Bayesian Formulation
  • 20.1.2 Variational Inference
  • 20.1.3 Gibbs Sampling
  • 20.2 Latent Dirichlet Allocation (LDA)
  • 20.2.1 Topic Models
  • 20.2.2 Bayesian Formulation
  • 20.2.3 Gibbs Sampling
  • 4 DISCRIMINATIVE APPROACH TO STATISTICAL MACHINE LEARNING
  • 21 Learning Models
  • 21.1 Linear-in-Parameter Model
  • 21.2 Kernel Model
  • 21.3 Hierarchical Model
  • 22 Least Squares Regression
  • 22.1 Method of LS
  • 22.2 Solution for Linear-in-Parameter Model
  • 22.3 Properties of LS Solution
  • 22.4 Learning Algorithm for Large-Scale Data
  • 22.5 Learning Algorithm for Hierarchical Model
  • 23 Constrained LS Regression
  • 23.1 Subspace-Constrained LS
  • 23.2?2-Constrained LS
  • 23.3 Model Selection
  • 24 Sparse Regression
  • 24.1?1-Constrained LS
  • 24.2 Solving?1-Constrained LS
  • 24.3 Feature Selection by Sparse Learning
  • 24.4 Various Extensions
  • 24.4.1 Generalized?1-Constrained LS
  • 24.4.2?p-Constrained LS
  • 24.4.3?1+?2-Constrained LS
  • 24.4.4?1,2-Constrained LS
  • 24.4.5 Trace Norm Constrained LS
  • 25 Robust Regression
  • 25.1 Nonrobustness of?2-Loss Minimization
  • 25.2?1-Loss Minimization
  • 25.3 Huber Loss Minimization
  • 25.3.1 Definition
  • 25.3.2 Stochastic Gradient Algorithm
  • 25.3.3 Iteratively Reweighted LS
  • 25.3.4?1-Constrained Huber Loss Minimization
  • 25.4 Tukey Loss Minimization
  • 26 Least Squares Classification
  • 26.1 Classification by LS Regression
  • 26.2 0/1-Loss and Margin
  • 26.3 Multiclass Classification
  • 27 Support Vector Classification
  • 27.1 Maximum Margin Classification
  • 27.1.1 Hard Margin Support Vector Classification.
  • 27.1.2 Soft Margin Support Vector Classification
  • 27.2 Dual Optimization of Support Vector Classification
  • 27.3 Sparseness of Dual Solution
  • 27.4 Nonlinearization by Kernel Trick
  • 27.5 Multiclass Extension
  • 27.6 Loss Minimization View
  • 27.6.1 Hinge Loss Minimization
  • 27.6.2 Squared Hinge Loss Minimization
  • 27.6.3 Ramp Loss Minimization
  • 28 Probabilistic Classification
  • 28.1 Logistic Regression
  • 28.1.1 Logistic Model and MLE
  • 28.1.2 Loss Minimization View
  • 28.2 LS Probabilistic Classification
  • 29 Structured Classification
  • 29.1 Sequence Classification
  • 29.2 Probabilistic Classification for Sequences
  • 29.2.1 Conditional Random Field
  • 29.2.2 MLE
  • 29.2.3 Recursive Computation
  • 29.2.4 Prediction for New Sample
  • 29.3 Deterministic Classification for Sequences
  • 5 FURTHER TOPICS
  • 30 Ensemble Learning
  • 30.1 Decision Stump Classifier
  • 30.2 Bagging
  • 30.3 Boosting
  • 30.3.1 Adaboost
  • 30.3.2 Loss Minimization View
  • 30.4 General Ensemble Learning
  • 31 Online Learning
  • 31.1 Stochastic Gradient Descent
  • 31.2 Passive-Aggressive Learning
  • 31.2.1 Classification
  • 31.2.2 Regression
  • 31.3 Adaptive Regularization of Weight Vectors (AROW)
  • 31.3.1 Uncertainty of Parameters
  • 31.3.2 Classification
  • 31.3.3 Regression
  • 32 Confidence of Prediction
  • 32.1 Predictive Variance for?2-Regularized LS
  • 32.2 Bootstrap Confidence Estimation
  • 32.3 Applications
  • 32.3.1 Time-series Prediction
  • 32.3.2 Tuning Parameter Optimization
  • 33 Semisupervised Learning
  • 33.1 Manifold Regularization
  • 33.1.1 Manifold Structure Brought by Input Samples
  • 33.1.2 Computing the Solution
  • 33.2 Covariate Shift Adaptation
  • 33.2.1 Importance Weighted Learning
  • 33.2.2 Relative Importance Weighted Learning
  • 33.2.3 Importance Weighted Cross Validation
  • 33.2.4 Importance Estimation.