Machine learning : a Bayesian and optimization perspective /
This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
London ; San Diego :
Elsevier : Academic Press,
2015.
|
Colección: | .NET Developers Series.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Front Cover
- Machine Learning: A Bayesian and Optimization Perspective
- Copyright
- Contents
- Preface
- Acknowledgments
- Notation
- Dedication
- Chapter 1: Introduction
- 1.1 What Machine Learning is About
- 1.1.1 Classification
- 1.1.2 Regression
- 1.2 Structure and a Road Map of the Book
- References
- Chapter 2: Probability and Stochastic Processes
- 2.1 Introduction
- 2.2 Probability and Random Variables
- 2.2.1 Probability
- Relative frequency definition
- Axiomatic definition
- 2.2.2 Discrete Random Variables
- Joint and conditional probabilities
- Bayes theorem
- 2.2.3 Continuous Random Variables
- 2.2.4 Mean and Variance
- Complex random variables
- 2.2.5 Transformation of Random Variables
- 2.3 Examples of Distributions
- 2.3.1 Discrete Variables
- The Bernoulli distribution
- The Binomial distribution
- The Multinomial distribution
- 2.3.2 Continuous Variables
- The uniform distribution
- The Gaussian distribution
- The central limit theorem
- The exponential distribution
- The beta distribution
- The gamma distribution
- The Dirichlet distribution
- 2.4 Stochastic Processes
- 2.4.1 First and Second Order Statistics
- 2.4.2 Stationarity and Ergodicity
- 2.4.3 Power Spectral Density
- Properties of the autocorrelation sequence
- Power spectral density
- Transmission through a linear system
- Physical interpretation of the PSD
- 2.4.4 Autoregressive Models
- 2.5 Information Theory
- 2.5.1 Discrete Random Variables
- Information
- Mutual and conditional information
- Entropy and average mutual information
- 2.5.2 Continuous Random Variables
- Average mutual information and conditional information
- Relative entropy or Kullback-Leibler divergence
- 2.6 Stochastic Convergence
- Convergence everywhere
- Convergence almost everywhere
- Convergence in the mean-square sense.
- Convergence in probability
- Convergence in distribution
- Problems
- References
- Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions
- 3.1 Introduction
- 3.2 Parameter Estimation: The Deterministic Point of View
- 3.3 Linear Regression
- 3.4 Classification
- Generative versus discriminative learning
- Supervised, semisupervised, and unsupervised learning
- 3.5 Biased Versus Unbiased Estimation
- 3.5.1 Biased or Unbiased Estimation?
- 3.6 The Cram�er-Rao Lower Bound
- 3.7 Sufficient Statistic
- 3.8 Regularization
- Inverse problems: Ill-conditioning and overfitting
- 3.9 The Bias-Variance Dilemma
- 3.9.1 Mean-Square Error Estimation
- 3.9.2 Bias-Variance Tradeoff
- 3.10 Maximum Likelihood Method
- 3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case
- 3.11 Bayesian Inference
- 3.11.1 The Maximum A Posteriori Probability Estimation Method
- 3.12 Curse of Dimensionality
- 3.13 Validation
- Cross-validation
- 3.14 Expected and Empirical Loss Functions
- 3.15 Nonparametric Modeling and Estimation
- Problems
- References
- Chapter 4: Mean-Square Error Linear Estimation
- 4.1 Introduction
- 4.2 Mean-Square Error Linear Estimation: The Normal Equations
- 4.2.1 The Cost Function Surface
- 4.3 A Geometric Viewpoint: Orthogonality Condition
- 4.4 Extension to Complex-Valued Variables
- 4.4.1 Widely Linear Complex-Valued Estimation
- Circularity conditions
- 4.4.2 Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus
- 4.5 Linear Filtering
- 4.6 MSE Linear Filtering: A Frequency Domain Point of View
- Deconvolution: image deblurring
- 4.7 Some Typical Applications
- 4.7.1 Interference Cancellation
- 4.7.2 System Identification
- 4.7.3 Deconvolution: Channel Equalization
- 4.8 Algorithmic Aspects
- Forward and backward MSE optimal predictors.
- 4.8.1 The Lattice-Ladder Scheme
- Orthogonality of the optimal backward errors
- 4.9 Mean-Square Error Estimation of Linear Models
- 4.9.1 The Gauss-Markov Theorem
- 4.9.2 Constrained Linear Estimation: The Beamforming Case
- 4.10 Time-Varying Statistics: Kalman Filtering
- Problems
- MATLAB Exercises
- References
- Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family
- 5.1 Introduction
- 5.2 The Steepest Descent Method
- 5.3 Application to the Mean-Square Error Cost Function
- Time-varying step-sizes
- 5.3.1 The Complex-Valued Case
- 5.4 Stochastic Approximation
- Application to the MSE linear estimation
- 5.5 The Least-Mean-Squares Adaptive Algorithm
- 5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments
- Convergence of the parameter error vector
- 5.5.2 Cumulative Loss Bounds
- 5.6 The Affine Projection Algorithm
- Geometric interpretation of APA
- Orthogonal projections
- 5.6.1 The Normalized LMS
- 5.7 The Complex-Valued Case
- The widely linear LMS
- The widely linear APA
- 5.8 Relatives of the LMS
- The sign-error LMS
- The least-mean-fourth (LMF) algorithm
- Transform-domain LMS
- 5.9 Simulation Examples
- 5.10 Adaptive Decision Feedback Equalization
- 5.11 The Linearly Constrained LMS
- 5.12 Tracking Performance of the LMS in Nonstationary Environments
- 5.13 Distributed Learning: The Distributed LMS
- 5.13.1 Cooperation Strategies
- Centralized networks
- Decentralized networks
- 5.13.2 The Diffusion LMS
- 5.13.3 Convergence and Steady-State Performance: Some Highlights
- 5.13.4 Consensus-Based Distributed Schemes
- 5.14 A Case Study: Target Localization
- 5.15 Some Concluding Remarks: Consensus Matrix
- Problems
- MATLAB Exercises
- References
- Chapter 6: The Least-Squares Family
- 6.1 Introduction.
- Protein folding prediction as a classification task
- Classification of folding prediction via decision trees
- Problems
- MATLAB Exercises
- References
- Chapter 8: Parameter Learning: A Convex Analytic Path
- 8.1 Introduction
- 8.2 Convex Sets and Functions
- 8.2.1 Convex Sets
- 8.2.2 Convex Functions
- 8.3 Projections onto Convex Sets
- 8.3.1 Properties of Projections
- 8.4 Fundamental Theorem of Projections onto Convex Sets
- 8.5 A Parallel Version of POCS
- 8.6 From Convex Sets to Parameter Estimation and Machine Learning
- 8.6.1 Regression
- 8.6.2 Classification
- 8.7 Infinite Many Closed Convex Sets: The Online Learning Case
- 8.7.1 Convergence of APSM
- Some practical hints
- 8.8 Constrained Learning
- 8.9 The Distributed APSM
- 8.10 Optimizing Nonsmooth Convex Cost Functions
- 8.10.1 Subgradients and Subdifferentials
- 8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: The BatchLearning Case
- The subgradient method
- The generic projected subgradient scheme
- The projected gradient method (PGM)
- Projected subgradient method
- 8.10.3 Online Learning for Convex Optimization
- The PEGASOS algorithm
- 8.11 Regret Analysis
- Regret analysis of the subgradient algorithm
- 8.12 Online Learning and Big Data Applications: A Discussion
- Approximation, estimation and optimization errors
- Batch versus online learning
- 8.13 Proximal Operators
- 8.13.1 Properties of the Proximal Operator
- 8.13.2 Proximal Minimization
- Resolvent of the subdifferential mapping
- 8.14 Proximal Splitting Methods for Optimization
- The proximal forward-backward splitting operator
- Alternating direction method of multipliers (ADMM)
- Mirror descent algorithms
- Problems
- MATLAB Exercises
- 8.15 Appendix to Chapter 8
- References
- Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations.