Cargando…

Machine learning : a Bayesian and optimization perspective /

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Theodoridis, Sergios, 1951-
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	London ; San Diego : Elsevier : Academic Press, 2015.
Colección:	.NET Developers Series.
Temas:	Machine learning. Bayesian statistical decision theory. Mathematical optimization. Apprentissage automatique. Th�eorie de la d�ecision bay�esienne. Optimisation math�ematique. COMPUTERS > General. Bayesian statistical decision theory Machine learning Mathematical optimization
Acceso en línea:	Texto completo

Tabla de Contenidos:

Front Cover
Machine Learning: A Bayesian and Optimization Perspective
Copyright
Contents
Preface
Acknowledgments
Notation
Dedication
Chapter 1: Introduction
1.1 What Machine Learning is About
1.1.1 Classification
1.1.2 Regression
1.2 Structure and a Road Map of the Book
References
Chapter 2: Probability and Stochastic Processes
2.1 Introduction
2.2 Probability and Random Variables
2.2.1 Probability
Relative frequency definition
Axiomatic definition
2.2.2 Discrete Random Variables
Joint and conditional probabilities
Bayes theorem
2.2.3 Continuous Random Variables
2.2.4 Mean and Variance
Complex random variables
2.2.5 Transformation of Random Variables
2.3 Examples of Distributions
2.3.1 Discrete Variables
The Bernoulli distribution
The Binomial distribution
The Multinomial distribution
2.3.2 Continuous Variables
The uniform distribution
The Gaussian distribution
The central limit theorem
The exponential distribution
The beta distribution
The gamma distribution
The Dirichlet distribution
2.4 Stochastic Processes
2.4.1 First and Second Order Statistics
2.4.2 Stationarity and Ergodicity
2.4.3 Power Spectral Density
Properties of the autocorrelation sequence
Power spectral density
Transmission through a linear system
Physical interpretation of the PSD
2.4.4 Autoregressive Models
2.5 Information Theory
2.5.1 Discrete Random Variables
Information
Mutual and conditional information
Entropy and average mutual information
2.5.2 Continuous Random Variables
Average mutual information and conditional information
Relative entropy or Kullback-Leibler divergence
2.6 Stochastic Convergence
Convergence everywhere
Convergence almost everywhere
Convergence in the mean-square sense.
Convergence in probability
Convergence in distribution
Problems
References
Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions
3.1 Introduction
3.2 Parameter Estimation: The Deterministic Point of View
3.3 Linear Regression
3.4 Classification
Generative versus discriminative learning
Supervised, semisupervised, and unsupervised learning
3.5 Biased Versus Unbiased Estimation
3.5.1 Biased or Unbiased Estimation?
3.6 The Cram�er-Rao Lower Bound
3.7 Sufficient Statistic
3.8 Regularization
Inverse problems: Ill-conditioning and overfitting
3.9 The Bias-Variance Dilemma
3.9.1 Mean-Square Error Estimation
3.9.2 Bias-Variance Tradeoff
3.10 Maximum Likelihood Method
3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case
3.11 Bayesian Inference
3.11.1 The Maximum A Posteriori Probability Estimation Method
3.12 Curse of Dimensionality
3.13 Validation
Cross-validation
3.14 Expected and Empirical Loss Functions
3.15 Nonparametric Modeling and Estimation
Problems
References
Chapter 4: Mean-Square Error Linear Estimation
4.1 Introduction
4.2 Mean-Square Error Linear Estimation: The Normal Equations
4.2.1 The Cost Function Surface
4.3 A Geometric Viewpoint: Orthogonality Condition
4.4 Extension to Complex-Valued Variables
4.4.1 Widely Linear Complex-Valued Estimation
Circularity conditions
4.4.2 Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus
4.5 Linear Filtering
4.6 MSE Linear Filtering: A Frequency Domain Point of View
Deconvolution: image deblurring
4.7 Some Typical Applications
4.7.1 Interference Cancellation
4.7.2 System Identification
4.7.3 Deconvolution: Channel Equalization
4.8 Algorithmic Aspects
Forward and backward MSE optimal predictors.
4.8.1 The Lattice-Ladder Scheme
Orthogonality of the optimal backward errors
4.9 Mean-Square Error Estimation of Linear Models
4.9.1 The Gauss-Markov Theorem
4.9.2 Constrained Linear Estimation: The Beamforming Case
4.10 Time-Varying Statistics: Kalman Filtering
Problems
MATLAB Exercises
References
Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family
5.1 Introduction
5.2 The Steepest Descent Method
5.3 Application to the Mean-Square Error Cost Function
Time-varying step-sizes
5.3.1 The Complex-Valued Case
5.4 Stochastic Approximation
Application to the MSE linear estimation
5.5 The Least-Mean-Squares Adaptive Algorithm
5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments
Convergence of the parameter error vector
5.5.2 Cumulative Loss Bounds
5.6 The Affine Projection Algorithm
Geometric interpretation of APA
Orthogonal projections
5.6.1 The Normalized LMS
5.7 The Complex-Valued Case
The widely linear LMS
The widely linear APA
5.8 Relatives of the LMS
The sign-error LMS
The least-mean-fourth (LMF) algorithm
Transform-domain LMS
5.9 Simulation Examples
5.10 Adaptive Decision Feedback Equalization
5.11 The Linearly Constrained LMS
5.12 Tracking Performance of the LMS in Nonstationary Environments
5.13 Distributed Learning: The Distributed LMS
5.13.1 Cooperation Strategies
Centralized networks
Decentralized networks
5.13.2 The Diffusion LMS
5.13.3 Convergence and Steady-State Performance: Some Highlights
5.13.4 Consensus-Based Distributed Schemes
5.14 A Case Study: Target Localization
5.15 Some Concluding Remarks: Consensus Matrix
Problems
MATLAB Exercises
References
Chapter 6: The Least-Squares Family
6.1 Introduction.
Protein folding prediction as a classification task
Classification of folding prediction via decision trees
Problems
MATLAB Exercises
References
Chapter 8: Parameter Learning: A Convex Analytic Path
8.1 Introduction
8.2 Convex Sets and Functions
8.2.1 Convex Sets
8.2.2 Convex Functions
8.3 Projections onto Convex Sets
8.3.1 Properties of Projections
8.4 Fundamental Theorem of Projections onto Convex Sets
8.5 A Parallel Version of POCS
8.6 From Convex Sets to Parameter Estimation and Machine Learning
8.6.1 Regression
8.6.2 Classification
8.7 Infinite Many Closed Convex Sets: The Online Learning Case
8.7.1 Convergence of APSM
Some practical hints
8.8 Constrained Learning
8.9 The Distributed APSM
8.10 Optimizing Nonsmooth Convex Cost Functions
8.10.1 Subgradients and Subdifferentials
8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: The BatchLearning Case
The subgradient method
The generic projected subgradient scheme
The projected gradient method (PGM)
Projected subgradient method
8.10.3 Online Learning for Convex Optimization
The PEGASOS algorithm
8.11 Regret Analysis
Regret analysis of the subgradient algorithm
8.12 Online Learning and Big Data Applications: A Discussion
Approximation, estimation and optimization errors
Batch versus online learning
8.13 Proximal Operators
8.13.1 Properties of the Proximal Operator
8.13.2 Proximal Minimization
Resolvent of the subdifferential mapping
8.14 Proximal Splitting Methods for Optimization
The proximal forward-backward splitting operator
Alternating direction method of multipliers (ADMM)
Mirror descent algorithms
Problems
MATLAB Exercises
8.15 Appendix to Chapter 8
References
Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations.

Machine learning : a Bayesian and optimization perspective /

Ejemplares similares