Cargando…

Mastering predictive analytics with R : master the craft of predictive modeling by developing strategy, intuition, and a solid foundation in essential concepts /

This book is intended for the budding data scientist, predictive modeler, or quantitative analyst with only a basic exposure to R and statistics. It is also designed to be a reference for experienced professionals wanting to brush up on the details of a particular type of predictive model. Mastering...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Forte, Rui Miguel (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing, 2015.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • Acknowledgments
  • About the Reviewers
  • www.PacktPub.com
  • Preface
  • Chapter 1: Gearing Up for Predictive Modeling
  • Models
  • Learning from data
  • The core components of a model
  • Our first model: k-nearest neighbors
  • Types of models
  • Supervised, unsupervised, semi-supervised, and reinforcement learning models
  • Parametric and nonparametric models
  • Regression and classification models
  • Real time and batch machine learning models
  • The process of predictive modeling
  • Defining the model's objective
  • Collecting the data
  • Picking a model
  • Pre-processing the data
  • Exploratory data analysis
  • Feature transformations
  • Encoding categorical features
  • Missing data
  • Outliers
  • Removing problematic features
  • Feature engineering and dimensionality reduction
  • Training and assessing the model
  • Repeating with different models and final model selection
  • Deploying the model
  • Performance metrics
  • Assessing regression models
  • Assessing classification models
  • Assessing binary classification models
  • Summary
  • Chapter 2 : Linear Regression
  • Linear regression
  • Assumptions of linear regression
  • Simple linear regression
  • Estimating the regression coefficients
  • Multiple linear regression
  • Predicting CPU performance
  • Predicting the price of used cars
  • Assessing linear regression models
  • Residual analysis
  • Significance tests for linear regression
  • Performance metrics for linear regression
  • Comparing different regression models
  • Test set performance
  • Problems with linear regression
  • Multicollinearity
  • Outliers
  • Feature selection
  • Regularization
  • Ridge regression
  • Least absolute shrinkage and selection operator (lasso)
  • Implementing regularization in R
  • Summary
  • Chapter 3 : Logistic Regression.
  • Classifying with linear regression
  • Logistic regression
  • Generalized linear models
  • Interpreting coefficients in logistic regression
  • Assumptions of logistic regression
  • Maximum likelihood estimation
  • Predicting heart disease
  • Assessing logistic regression models
  • Model deviance
  • Test set performance
  • Regularization with the lasso
  • Classification metrics
  • Extensions of the binary logistic classifier
  • Multinomial logistic regression
  • Predicting glass type
  • Ordinal logistic regression
  • Predicting wine quality
  • Summary
  • Chapter 4 : Neural Networks
  • The biological neuron
  • The artificial neuron
  • Stochastic gradient descent
  • Gradient descent and local minima
  • The perceptron algorithm
  • Linear separation
  • The logistic neuron
  • Multilayer perceptron networks
  • Training multilayer perceptron networks
  • Predicting the energy efficiency of buildings
  • Evaluating multilayer perceptrons for regression
  • Predicting glass type revisited
  • Predicting handwritten digits
  • Receiver operating characteristic curves
  • Summary
  • Chapter 5 : Support Vector Machines
  • Maximal margin classification
  • Support vector classification
  • Inner products
  • Kernels and support vector machines
  • Predicting chemical biodegration
  • Cross-validation
  • Predicting credit scores
  • Multi-class classification with support vector machines
  • Summary
  • Chapter 6 : Tree-based Methods
  • The intuition for tree models
  • Algorithms for training decision trees
  • Classification and regression trees
  • CART regression trees
  • Tree pruning
  • Missing data
  • Regression model trees
  • CART classification trees
  • C5.0
  • Predicting class membership on synthetic 2D data
  • Predicting the authenticity of banknotes
  • Predicting complex skill learning
  • Tuning model parameters in CART trees
  • Variable importance in tree models.
  • Regression model trees in action
  • Summary
  • Chapter 7 : Ensemble Methods
  • Bagging
  • Margins and out-of-bag observations
  • Predicting complex skill learning with bagging
  • Predicting heart disease with bagging
  • Limitations of bagging
  • Boosting
  • AdaBoost
  • Predicting atmospheric gamma ray radiation
  • Predicting complex skill learning with boosting
  • Limitations of boosting
  • Random forests
  • The importance of variables in random forests
  • Summary
  • Chapter 8 : Probabilistic Graphical Models
  • A Little Graph Theory
  • Bayes' Theorem
  • Conditional independence
  • Bayesian networks
  • The Naïve Bayes classifier
  • Predicting the sentiment of movie reviews
  • Hidden Markov models
  • Predicting promoter gene sequences
  • Predicting letter patterns in English words
  • Summary
  • Chapter 9 : Time Series Analysis
  • Fundamental concepts of time series
  • Time series summary functions
  • Some fundamental time series
  • White noise
  • Fitting a white noise time series
  • Random walk
  • Fitting a random walk
  • Stationarity
  • Stationary time series models
  • Moving average models
  • Autoregressive models
  • Autoregressive moving average models
  • Non-stationary time series models
  • Autoregressive integrated moving average models
  • Autoregressive conditional heteroscedasticity models
  • Generalized autoregressive heteroscedasticity models
  • Predicting intense earthquakes
  • Predicting lynx trappings
  • Predicting foreign exchange rates
  • Other time series models
  • Summary
  • Chapter 10 : Topic Modeling
  • An overview of topic modeling
  • Latent Dirichlet Allocation
  • The Dirichlet distribution
  • The generative process
  • Fitting an LDA model
  • Modeling the topics of online news stories
  • Model stability
  • Finding the number of topics
  • Topic distributions
  • Word distributions
  • LDA extensions
  • Summary.
  • Chapter 11 : Recommendation Systems
  • Rating matrix
  • Measuring user similarity
  • Collaborative filtering
  • User-based collaborative filtering
  • Item-based collaborative filtering
  • Singular value decomposition
  • R and Big Data
  • Predicting recommendations for movies and jokes
  • Loading and preprocessing the data
  • Exploring the data
  • Evaluating binary top-N recommendations
  • Evaluating non-binary top-N recommendations
  • Evaluating individual predictions
  • Other approaches to recommendation systems
  • Summary
  • Index.