Cargando…

Machine learning : a Bayesian and optimization perspective /

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Theodoridis, Sergios, 1951-
Formato: Electrónico eBook
Idioma:Inglés
Publicado: London ; San Diego : Elsevier : Academic Press, 2015.
Colección:.NET Developers Series.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Front Cover
  • Machine Learning: A Bayesian and Optimization Perspective
  • Copyright
  • Contents
  • Preface
  • Acknowledgments
  • Notation
  • Dedication
  • Chapter 1: Introduction
  • 1.1 What Machine Learning is About
  • 1.1.1 Classification
  • 1.1.2 Regression
  • 1.2 Structure and a Road Map of the Book
  • References
  • Chapter 2: Probability and Stochastic Processes
  • 2.1 Introduction
  • 2.2 Probability and Random Variables
  • 2.2.1 Probability
  • Relative frequency definition
  • Axiomatic definition
  • 2.2.2 Discrete Random Variables
  • Joint and conditional probabilities
  • Bayes theorem
  • 2.2.3 Continuous Random Variables
  • 2.2.4 Mean and Variance
  • Complex random variables
  • 2.2.5 Transformation of Random Variables
  • 2.3 Examples of Distributions
  • 2.3.1 Discrete Variables
  • The Bernoulli distribution
  • The Binomial distribution
  • The Multinomial distribution
  • 2.3.2 Continuous Variables
  • The uniform distribution
  • The Gaussian distribution
  • The central limit theorem
  • The exponential distribution
  • The beta distribution
  • The gamma distribution
  • The Dirichlet distribution
  • 2.4 Stochastic Processes
  • 2.4.1 First and Second Order Statistics
  • 2.4.2 Stationarity and Ergodicity
  • 2.4.3 Power Spectral Density
  • Properties of the autocorrelation sequence
  • Power spectral density
  • Transmission through a linear system
  • Physical interpretation of the PSD
  • 2.4.4 Autoregressive Models
  • 2.5 Information Theory
  • 2.5.1 Discrete Random Variables
  • Information
  • Mutual and conditional information
  • Entropy and average mutual information
  • 2.5.2 Continuous Random Variables
  • Average mutual information and conditional information
  • Relative entropy or Kullback-Leibler divergence
  • 2.6 Stochastic Convergence
  • Convergence everywhere
  • Convergence almost everywhere
  • Convergence in the mean-square sense.
  • Convergence in probability
  • Convergence in distribution
  • Problems
  • References
  • Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions
  • 3.1 Introduction
  • 3.2 Parameter Estimation: The Deterministic Point of View
  • 3.3 Linear Regression
  • 3.4 Classification
  • Generative versus discriminative learning
  • Supervised, semisupervised, and unsupervised learning
  • 3.5 Biased Versus Unbiased Estimation
  • 3.5.1 Biased or Unbiased Estimation?
  • 3.6 The Cram�er-Rao Lower Bound
  • 3.7 Sufficient Statistic
  • 3.8 Regularization
  • Inverse problems: Ill-conditioning and overfitting
  • 3.9 The Bias-Variance Dilemma
  • 3.9.1 Mean-Square Error Estimation
  • 3.9.2 Bias-Variance Tradeoff
  • 3.10 Maximum Likelihood Method
  • 3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case
  • 3.11 Bayesian Inference
  • 3.11.1 The Maximum A Posteriori Probability Estimation Method
  • 3.12 Curse of Dimensionality
  • 3.13 Validation
  • Cross-validation
  • 3.14 Expected and Empirical Loss Functions
  • 3.15 Nonparametric Modeling and Estimation
  • Problems
  • References
  • Chapter 4: Mean-Square Error Linear Estimation
  • 4.1 Introduction
  • 4.2 Mean-Square Error Linear Estimation: The Normal Equations
  • 4.2.1 The Cost Function Surface
  • 4.3 A Geometric Viewpoint: Orthogonality Condition
  • 4.4 Extension to Complex-Valued Variables
  • 4.4.1 Widely Linear Complex-Valued Estimation
  • Circularity conditions
  • 4.4.2 Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus
  • 4.5 Linear Filtering
  • 4.6 MSE Linear Filtering: A Frequency Domain Point of View
  • Deconvolution: image deblurring
  • 4.7 Some Typical Applications
  • 4.7.1 Interference Cancellation
  • 4.7.2 System Identification
  • 4.7.3 Deconvolution: Channel Equalization
  • 4.8 Algorithmic Aspects
  • Forward and backward MSE optimal predictors.
  • 4.8.1 The Lattice-Ladder Scheme
  • Orthogonality of the optimal backward errors
  • 4.9 Mean-Square Error Estimation of Linear Models
  • 4.9.1 The Gauss-Markov Theorem
  • 4.9.2 Constrained Linear Estimation: The Beamforming Case
  • 4.10 Time-Varying Statistics: Kalman Filtering
  • Problems
  • MATLAB Exercises
  • References
  • Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family
  • 5.1 Introduction
  • 5.2 The Steepest Descent Method
  • 5.3 Application to the Mean-Square Error Cost Function
  • Time-varying step-sizes
  • 5.3.1 The Complex-Valued Case
  • 5.4 Stochastic Approximation
  • Application to the MSE linear estimation
  • 5.5 The Least-Mean-Squares Adaptive Algorithm
  • 5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments
  • Convergence of the parameter error vector
  • 5.5.2 Cumulative Loss Bounds
  • 5.6 The Affine Projection Algorithm
  • Geometric interpretation of APA
  • Orthogonal projections
  • 5.6.1 The Normalized LMS
  • 5.7 The Complex-Valued Case
  • The widely linear LMS
  • The widely linear APA
  • 5.8 Relatives of the LMS
  • The sign-error LMS
  • The least-mean-fourth (LMF) algorithm
  • Transform-domain LMS
  • 5.9 Simulation Examples
  • 5.10 Adaptive Decision Feedback Equalization
  • 5.11 The Linearly Constrained LMS
  • 5.12 Tracking Performance of the LMS in Nonstationary Environments
  • 5.13 Distributed Learning: The Distributed LMS
  • 5.13.1 Cooperation Strategies
  • Centralized networks
  • Decentralized networks
  • 5.13.2 The Diffusion LMS
  • 5.13.3 Convergence and Steady-State Performance: Some Highlights
  • 5.13.4 Consensus-Based Distributed Schemes
  • 5.14 A Case Study: Target Localization
  • 5.15 Some Concluding Remarks: Consensus Matrix
  • Problems
  • MATLAB Exercises
  • References
  • Chapter 6: The Least-Squares Family
  • 6.1 Introduction.
  • Protein folding prediction as a classification task
  • Classification of folding prediction via decision trees
  • Problems
  • MATLAB Exercises
  • References
  • Chapter 8: Parameter Learning: A Convex Analytic Path
  • 8.1 Introduction
  • 8.2 Convex Sets and Functions
  • 8.2.1 Convex Sets
  • 8.2.2 Convex Functions
  • 8.3 Projections onto Convex Sets
  • 8.3.1 Properties of Projections
  • 8.4 Fundamental Theorem of Projections onto Convex Sets
  • 8.5 A Parallel Version of POCS
  • 8.6 From Convex Sets to Parameter Estimation and Machine Learning
  • 8.6.1 Regression
  • 8.6.2 Classification
  • 8.7 Infinite Many Closed Convex Sets: The Online Learning Case
  • 8.7.1 Convergence of APSM
  • Some practical hints
  • 8.8 Constrained Learning
  • 8.9 The Distributed APSM
  • 8.10 Optimizing Nonsmooth Convex Cost Functions
  • 8.10.1 Subgradients and Subdifferentials
  • 8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: The BatchLearning Case
  • The subgradient method
  • The generic projected subgradient scheme
  • The projected gradient method (PGM)
  • Projected subgradient method
  • 8.10.3 Online Learning for Convex Optimization
  • The PEGASOS algorithm
  • 8.11 Regret Analysis
  • Regret analysis of the subgradient algorithm
  • 8.12 Online Learning and Big Data Applications: A Discussion
  • Approximation, estimation and optimization errors
  • Batch versus online learning
  • 8.13 Proximal Operators
  • 8.13.1 Properties of the Proximal Operator
  • 8.13.2 Proximal Minimization
  • Resolvent of the subdifferential mapping
  • 8.14 Proximal Splitting Methods for Optimization
  • The proximal forward-backward splitting operator
  • Alternating direction method of multipliers (ADMM)
  • Mirror descent algorithms
  • Problems
  • MATLAB Exercises
  • 8.15 Appendix to Chapter 8
  • References
  • Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations.