Cargando…

Machine learning : a Bayesian and optimization perspective /

This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Theodoridis, Sergios, 1951-
Formato: Electrónico eBook
Idioma:Inglés
Publicado: London ; San Diego : Elsevier : Academic Press, 2015.
Colección:.NET Developers Series.
Temas:
Acceso en línea:Texto completo

MARC

LEADER 00000cam a2200000 i 4500
001 SCIDIR_ocn907475626
003 OCoLC
005 20231120111951.0
006 m o d
007 cr cnu---unuuu
008 150416s2015 enka ob 001 0 eng d
040 |a OPELS  |b eng  |e rda  |e pn  |c OPELS  |d IDEBK  |d N$T  |d TEFOD  |d YDXCP  |d EBLCP  |d COO  |d CDX  |d B24X7  |d TEFOD  |d DEBSZ  |d OCLCF  |d ORE  |d Z5A  |d LIV  |d OCLCQ  |d BUF  |d UUM  |d DEBBG  |d U3W  |d D6H  |d UKMGB  |d ESEHU  |d AU@  |d OCLCQ  |d WYU  |d LQU  |d DCT  |d ERF  |d OCLCQ  |d BRF  |d OCLCQ  |d OCLCO  |d COM  |d OCLCQ  |d OCLCO 
066 |c (S 
015 |a GBB519023  |2 bnb 
016 7 |a 017050419  |2 Uk 
019 |a 907310425  |a 907560179  |a 965129166  |a 965469772  |a 1014402055  |a 1105183455  |a 1105563629 
020 |a 9780128017227  |q (electronic bk.) 
020 |a 0128017228  |q (electronic bk.) 
020 |z 9780128015223 
020 |z 0128015225 
035 |a (OCoLC)907475626  |z (OCoLC)907310425  |z (OCoLC)907560179  |z (OCoLC)965129166  |z (OCoLC)965469772  |z (OCoLC)1014402055  |z (OCoLC)1105183455  |z (OCoLC)1105563629 
050 4 |a Q325.5  |b .T43 2015eb 
072 7 |a COM  |x 000000  |2 bisacsh 
082 0 4 |a 006.3/1  |2 23 
100 1 |a Theodoridis, Sergios,  |d 1951- 
245 1 0 |a Machine learning :  |b a Bayesian and optimization perspective /  |c Sergios Theodoridis. 
264 1 |a London ;  |a San Diego :  |b Elsevier :  |b Academic Press,  |c 2015. 
300 |a 1 online resource (xxi, 1050 pages) :  |b illustrations (some color) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
490 1 |a .NET Developers Series 
504 |a Includes bibliographical references and index. 
520 |a This tutorial text gives a unifying perspective on machine learning by covering both probabilistic and deterministic approaches -which are based on optimization techniques - together with the Bayesian inference approach, whose essence lies in the use of a hierarchy of probabilistic models. The book presents the major machine learning methods as they have been developed in different disciplines, such as statistics, statistical and adaptive signal processing and computer science. Focusing on the physical reasoning behind the mathematics, all the various methods and techniques are explained in depth, supported by examples and problems, giving an invaluable resource to the student and researcher for understanding and applying machine learning concepts. The book builds carefully from the basic classical methods to the most recent trends, with chapters written to be as self-contained as possible, making the text suitable for different courses: pattern recognition, statistical/adaptive signal processing, statistical/Bayesian learning, as well as short courses on sparse modeling, deep learning, and probabilistic graphical models. All major classical techniques: Mean/Least-Squares regression and filtering, Kalman filtering, stochastic approximation and online learning, Bayesian classification, decision trees, logistic regression and boosting methods. The latest trends: Sparsity, convex analysis and optimization, online distributed algorithms, learning in RKH spaces, Bayesian inference, graphical and hidden Markov models, particle filtering, deep learning, dictionary learning and latent variables modeling. Case studies - protein folding prediction, optical character recognition, text authorship identification, fMRI data analysis, change point detection, hyperspectral image unmixing, target localization, channel equalization and echo cancellation, show how the theory can be applied. MATLAB code for all the main algorithms are available on an accompanying website, enabling the reader to experiment with the code. 
588 0 |a Print version record. 
505 0 |a Front Cover -- Machine Learning: A Bayesian and Optimization Perspective -- Copyright -- Contents -- Preface -- Acknowledgments -- Notation -- Dedication -- Chapter 1: Introduction -- 1.1 What Machine Learning is About -- 1.1.1 Classification -- 1.1.2 Regression -- 1.2 Structure and a Road Map of the Book -- References -- Chapter 2: Probability and Stochastic Processes -- 2.1 Introduction -- 2.2 Probability and Random Variables -- 2.2.1 Probability -- Relative frequency definition -- Axiomatic definition -- 2.2.2 Discrete Random Variables -- Joint and conditional probabilities -- Bayes theorem -- 2.2.3 Continuous Random Variables -- 2.2.4 Mean and Variance -- Complex random variables -- 2.2.5 Transformation of Random Variables -- 2.3 Examples of Distributions -- 2.3.1 Discrete Variables -- The Bernoulli distribution -- The Binomial distribution -- The Multinomial distribution -- 2.3.2 Continuous Variables -- The uniform distribution -- The Gaussian distribution -- The central limit theorem -- The exponential distribution -- The beta distribution -- The gamma distribution -- The Dirichlet distribution -- 2.4 Stochastic Processes -- 2.4.1 First and Second Order Statistics -- 2.4.2 Stationarity and Ergodicity -- 2.4.3 Power Spectral Density -- Properties of the autocorrelation sequence -- Power spectral density -- Transmission through a linear system -- Physical interpretation of the PSD -- 2.4.4 Autoregressive Models -- 2.5 Information Theory -- 2.5.1 Discrete Random Variables -- Information -- Mutual and conditional information -- Entropy and average mutual information -- 2.5.2 Continuous Random Variables -- Average mutual information and conditional information -- Relative entropy or Kullback-Leibler divergence -- 2.6 Stochastic Convergence -- Convergence everywhere -- Convergence almost everywhere -- Convergence in the mean-square sense. 
505 8 |a Convergence in probability -- Convergence in distribution -- Problems -- References -- Chapter 3: Learning in Parametric Modeling: Basic Concepts and Directions -- 3.1 Introduction -- 3.2 Parameter Estimation: The Deterministic Point of View -- 3.3 Linear Regression -- 3.4 Classification -- Generative versus discriminative learning -- Supervised, semisupervised, and unsupervised learning -- 3.5 Biased Versus Unbiased Estimation -- 3.5.1 Biased or Unbiased Estimation? -- 3.6 The Cram�er-Rao Lower Bound -- 3.7 Sufficient Statistic -- 3.8 Regularization -- Inverse problems: Ill-conditioning and overfitting -- 3.9 The Bias-Variance Dilemma -- 3.9.1 Mean-Square Error Estimation -- 3.9.2 Bias-Variance Tradeoff -- 3.10 Maximum Likelihood Method -- 3.10.1 Linear Regression: The Nonwhite Gaussian Noise Case -- 3.11 Bayesian Inference -- 3.11.1 The Maximum A Posteriori Probability Estimation Method -- 3.12 Curse of Dimensionality -- 3.13 Validation -- Cross-validation -- 3.14 Expected and Empirical Loss Functions -- 3.15 Nonparametric Modeling and Estimation -- Problems -- References -- Chapter 4: Mean-Square Error Linear Estimation -- 4.1 Introduction -- 4.2 Mean-Square Error Linear Estimation: The Normal Equations -- 4.2.1 The Cost Function Surface -- 4.3 A Geometric Viewpoint: Orthogonality Condition -- 4.4 Extension to Complex-Valued Variables -- 4.4.1 Widely Linear Complex-Valued Estimation -- Circularity conditions -- 4.4.2 Optimizing with Respect to Complex-Valued Variables: Wirtinger Calculus -- 4.5 Linear Filtering -- 4.6 MSE Linear Filtering: A Frequency Domain Point of View -- Deconvolution: image deblurring -- 4.7 Some Typical Applications -- 4.7.1 Interference Cancellation -- 4.7.2 System Identification -- 4.7.3 Deconvolution: Channel Equalization -- 4.8 Algorithmic Aspects -- Forward and backward MSE optimal predictors. 
505 8 |a 4.8.1 The Lattice-Ladder Scheme -- Orthogonality of the optimal backward errors -- 4.9 Mean-Square Error Estimation of Linear Models -- 4.9.1 The Gauss-Markov Theorem -- 4.9.2 Constrained Linear Estimation: The Beamforming Case -- 4.10 Time-Varying Statistics: Kalman Filtering -- Problems -- MATLAB Exercises -- References -- Chapter 5: Stochastic Gradient Descent: The LMS Algorithm and its Family -- 5.1 Introduction -- 5.2 The Steepest Descent Method -- 5.3 Application to the Mean-Square Error Cost Function -- Time-varying step-sizes -- 5.3.1 The Complex-Valued Case -- 5.4 Stochastic Approximation -- Application to the MSE linear estimation -- 5.5 The Least-Mean-Squares Adaptive Algorithm -- 5.5.1 Convergence and Steady-State Performance of the LMS in Stationary Environments -- Convergence of the parameter error vector -- 5.5.2 Cumulative Loss Bounds -- 5.6 The Affine Projection Algorithm -- Geometric interpretation of APA -- Orthogonal projections -- 5.6.1 The Normalized LMS -- 5.7 The Complex-Valued Case -- The widely linear LMS -- The widely linear APA -- 5.8 Relatives of the LMS -- The sign-error LMS -- The least-mean-fourth (LMF) algorithm -- Transform-domain LMS -- 5.9 Simulation Examples -- 5.10 Adaptive Decision Feedback Equalization -- 5.11 The Linearly Constrained LMS -- 5.12 Tracking Performance of the LMS in Nonstationary Environments -- 5.13 Distributed Learning: The Distributed LMS -- 5.13.1 Cooperation Strategies -- Centralized networks -- Decentralized networks -- 5.13.2 The Diffusion LMS -- 5.13.3 Convergence and Steady-State Performance: Some Highlights -- 5.13.4 Consensus-Based Distributed Schemes -- 5.14 A Case Study: Target Localization -- 5.15 Some Concluding Remarks: Consensus Matrix -- Problems -- MATLAB Exercises -- References -- Chapter 6: The Least-Squares Family -- 6.1 Introduction. 
505 8 |a Protein folding prediction as a classification task -- Classification of folding prediction via decision trees -- Problems -- MATLAB Exercises -- References -- Chapter 8: Parameter Learning: A Convex Analytic Path -- 8.1 Introduction -- 8.2 Convex Sets and Functions -- 8.2.1 Convex Sets -- 8.2.2 Convex Functions -- 8.3 Projections onto Convex Sets -- 8.3.1 Properties of Projections -- 8.4 Fundamental Theorem of Projections onto Convex Sets -- 8.5 A Parallel Version of POCS -- 8.6 From Convex Sets to Parameter Estimation and Machine Learning -- 8.6.1 Regression -- 8.6.2 Classification -- 8.7 Infinite Many Closed Convex Sets: The Online Learning Case -- 8.7.1 Convergence of APSM -- Some practical hints -- 8.8 Constrained Learning -- 8.9 The Distributed APSM -- 8.10 Optimizing Nonsmooth Convex Cost Functions -- 8.10.1 Subgradients and Subdifferentials -- 8.10.2 Minimizing Nonsmooth Continuous Convex Loss Functions: The BatchLearning Case -- The subgradient method -- The generic projected subgradient scheme -- The projected gradient method (PGM) -- Projected subgradient method -- 8.10.3 Online Learning for Convex Optimization -- The PEGASOS algorithm -- 8.11 Regret Analysis -- Regret analysis of the subgradient algorithm -- 8.12 Online Learning and Big Data Applications: A Discussion -- Approximation, estimation and optimization errors -- Batch versus online learning -- 8.13 Proximal Operators -- 8.13.1 Properties of the Proximal Operator -- 8.13.2 Proximal Minimization -- Resolvent of the subdifferential mapping -- 8.14 Proximal Splitting Methods for Optimization -- The proximal forward-backward splitting operator -- Alternating direction method of multipliers (ADMM) -- Mirror descent algorithms -- Problems -- MATLAB Exercises -- 8.15 Appendix to Chapter 8 -- References -- Chapter 9: Sparsity-Aware Learning: Concepts and Theoretical Foundations. 
650 0 |a Machine learning. 
650 0 |a Bayesian statistical decision theory. 
650 0 |a Mathematical optimization. 
650 6 |a Apprentissage automatique.  |0 (CaQQLa)201-0131435 
650 6 |a Th�eorie de la d�ecision bay�esienne.  |0 (CaQQLa)000272233 
650 6 |a Optimisation math�ematique.  |0 (CaQQLa)201-0007680 
650 7 |a COMPUTERS  |x General.  |2 bisacsh 
650 7 |a Bayesian statistical decision theory  |2 fast  |0 (OCoLC)fst00829019 
650 7 |a Machine learning  |2 fast  |0 (OCoLC)fst01004795 
650 7 |a Mathematical optimization  |2 fast  |0 (OCoLC)fst01012099 
776 0 8 |i Print version:  |a Theodoridis, Sergios, 1951-  |t Machine learning  |z 9780128015223  |w (OCoLC)893899296 
830 0 |a .NET Developers Series. 
856 4 0 |u https://sciencedirect.uam.elogim.com/science/book/9780128015223  |z Texto completo 
880 8 |6 505-00/(S  |a 6.2 Least-Squares Linear Regression: A Geometric Perspective -- 6.3 Statistical Properties of the LS Estimator -- The LS estimator is unbiased -- Covariance matrix of the LS estimator -- The LS estimator is BLUE in the presence of white noise -- The LS estimator achieves the Cram�er-Rao bound for white Gaussian noise -- Asymptotic distribution of the LS estimator -- 6.4 Orthogonalizing the Column Space of X: The SVD Method -- Pseudo-inverse matrix and SVD -- 6.5 Ridge Regression -- Principal components regression -- 6.6 The Recursive Least-Squares Algorithm -- Time-iterative computations of Φn, pn -- Time updating of θn -- 6.7 Newton's Iterative Minimization Method -- 6.7.1 RLS and Newton's Method -- 6.8 Steady-State Performance of the RLS -- 6.9 Complex-Valued Data: The Widely Linear RLS -- 6.10 Computational Aspects of the LS Solution -- Cholesky factorization -- QR factorization -- Fast RLS versions -- 6.11 The Coordinate and Cyclic Coordinate Descent Methods -- 6.12 Simulation Examples -- 6.13 Total-Least-Squares -- Geometric interpretation of the total-least-squares method -- Problems -- MATLAB Exercises -- References -- Chapter 7: Classification: A Tour of the Classics -- 7.1 Introduction -- 7.2 Bayesian Classification -- The Bayesian classifier minimizes the misclassification error -- 7.2.1 Average Risk -- 7.3 Decision (Hyper)Surfaces -- 7.3.1 The Gaussian Distribution Case -- Minimum distance classifiers -- 7.4 The Naive Bayes Classifier -- 7.5 The Nearest Neighbor Rule -- 7.6 Logistic Regression -- 7.7 Fisher's Linear Discriminant -- 7.8 Classification Trees -- 7.9 Combining Classifiers -- Experimental comparisons -- Schemes for combining classifiers -- 7.10 The Boosting Approach -- The AdaBoost algorithm -- The log-loss function -- 7.11 Boosting Trees -- 7.12 A Case Study: Protein Folding Prediction.