Cargando…

Practical Data Science with R, Second Edition /

Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful, data analysis practices using the R language. By concentrating on the most important tasks you'll face on the job, this friendly guide is comfortable both for business analysts an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mount, John (Autor), Zumel, Nina (Autor)
Autor Corporativo:	Safari, an O'Reilly Media Company
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Manning Publications, 2019.
Edición:	2nd edition.
Temas:	R (Computer program language) Data mining. Mathematical statistics > Data processing. R (Langage de programmation) Exploration de données (Informatique) Statistique mathématique > Informatique. Mathematical statistics > Data processing Data mining
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

Tabla de Contenidos:

Intro
Practical Data Science with R, Second Edition
Nina Zumel and John Mount
Copyright
Dedication
Brief Table of Contents
Table of Contents
Praise for the First Edition
front matter
Foreword
Preface
Acknowledgments
About This Book
What is data science?
Roadmap
Audience
What is not in this book?
Code conventions and downloads
Working with this book
Downloading the book's supporting materials/repository
Book forum
About the Authors
About the Foreword Authors
About the Cover Illustration
Part 1. Introduction to data science
Chapter 1. The data science process
1.1. The roles in a data science project
1.1.1. Project roles
1.2. Stages of a data science project
1.2.1. Defining the goal
1.2.2. Data collection and management
1.2.3. Modeling
1.2.4. Model evaluation and critique
1.2.5. Presentation and documentation
1.2.6. Model deployment and maintenance
1.3. Setting expectations
1.3.1. Determining lower bounds on model performance
Summary
Chapter 2. Starting with R and data
2.1. Starting with R
2.1.1. Installing R, tools, and examples
2.1.2. R programming
2.2. Working with data from files
2.2.1. Working with well-structured data from files or URLs
2.2.2. Using R with less-structured data
2.3. Working with relational databases
2.3.1. A production-size example
Summary
Chapter 3. Exploring data
3.1. Using summary statistics to spot problems
3.1.1. Typical problems revealed by data summaries
3.2. Spotting problems using graphics and visualization
3.2.1. Visually checking distributions for a single variable
3.2.2. Visually checking relationships between two variables
Summary
Chapter 4. Managing data
4.1. Cleaning data
4.1.1. Domain-specific data cleaning
4.1.2. Treating missing values.
4.1.3. The vtreat package for automatically treating missing variables
4.2. Data transformations
4.2.1. Normalization
4.2.2. Centering and scaling
4.2.3. Log transformations for skewed and wide distributions
4.3. Sampling for modeling and validation
4.3.1. Test and training splits
4.3.2. Creating a sample group column
4.3.3. Record grouping
4.3.4. Data provenance
Summary
Chapter 5. Data engineering and data shaping
5.1. Data selection
5.1.1. Subsetting rows and columns
5.1.2. Removing records with incomplete data
5.1.3. Ordering rows
5.2. Basic data transforms
5.2.1. Adding new columns
5.2.2. Other simple operations
5.3. Aggregating transforms
5.3.1. Combining many rows into summary rows
5.4. Multitable data transforms
5.4.1. Combining two or more ordered data frames quickly
5.4.2. Principal methods to combine data from multiple tables
5.5. Reshaping transforms
5.5.1. Moving data from wide to tall form
5.5.2. Moving data from tall to wide form
5.5.3. Data coordinates
Summary
Part 2. Modeling methods
Chapter 6. Choosing and evaluating models
6.1. Mapping problems to machine learning tasks
6.1.1. Classification problems
6.1.2. Scoring problems
6.1.3. Grouping: working without known targets
6.1.4. Problem-to-method mapping
6.2. Evaluating models
6.2.1. Overfitting
6.2.2. Measures of model performance
6.2.3. Evaluating classification models
6.2.4. Evaluating scoring models
6.2.5. Evaluating probability models
6.3. Local interpretable model-agnostic explanations (LIME) for explai- ining model predictions
6.3.1. LIME: Automated sanity checking
6.3.2. Walking through LIME: A small example
6.3.3. LIME for text classification
6.3.4. Training the text classifier
6.3.5. Explaining the classifier's predictions
Summary.
Chapter 7. Linear and logistic regression
7.1. Using linear regression
7.1.1. Understanding linear regression
7.1.2. Building a linear regression model
7.1.3. Making predictions
7.1.4. Finding relations and extracting advice
7.1.5. Reading the model summary and characterizing coefficient quality
7.1.6. Linear regression takeaways
7.2. Using logistic regression
7.2.1. Understanding logistic regression
7.2.2. Building a logistic regression model
7.2.3. Making predictions
7.2.4. Finding relations and extracting advice from logistic models
7.2.5. Reading the model summary and characterizing coefficients
7.2.6. Logistic regression takeaways
7.3. Regularization
7.3.1. An example of quasi-separation
7.3.2. The types of regularized regression
7.3.3. Regularized regression with glmnet
Summary
Chapter 8. Advanced data preparation
8.1. The purpose of the vtreat package
8.2. KDD and KDD Cup 2009
8.2.1. Getting started with KDD Cup 2009 data
8.2.2. The bull-in-the-china-shop approach
8.3. Basic data preparation for classification
8.3.1. The variable score frame
8.4. Advanced data preparation for classification
8.4.1. Using mkCrossFrameCExperiment()
8.4.2. Building a model
Building a multivariable model
Evaluating the model
8.5. Preparing data for regression modeling
8.6. Mastering the vtreat package
8.6.1. The vtreat phases
8.6.2. Missing values
8.6.3. Indicator variables
8.6.4. Impact coding
8.6.5. The treatment plan
8.6.6. The cross-frame
Summary
Chapter 9. Unsupervised methods
9.1. Cluster analysis
9.1.1. Distances
9.1.2. Preparing the data
9.1.3. Hierarchical clustering with hclust
9.1.4. The k-means algorithm
9.1.5. Assigning new points to clusters
9.1.6. Clustering takeaways
9.2. Association rules.
9.2.1. Overview of association rules
9.2.2. The example problem
9.2.3. Mining association rules with the arules package
9.2.4. Association rule takeaways
Summary
Chapter 10. Exploring advanced methods
10.1. Tree-based methods
10.1.1. A basic decision tree
10.1.2. Using bagging to improve prediction
10.1.3. Using random forests to further improve prediction
10.1.4. Gradient-boosted trees
10.1.5. Tree-based model takeaways
10.2. Using generalized additive models (GAMs) to learn non-monotone relationships
10.2.1. Understanding GAMs
10.2.2. A one-dimensional regression example
10.2.3. Extracting the non-linear relationships
10.2.4. Using GAM on actual data
10.2.5. Using GAM for logistic regression
10.2.6. GAM takeaways
10.3. Solving "inseparable" problems using support vector machines
10.3.1. Using an SVM to solve a problem
10.3.2. Understanding support vector machines
10.3.3. Understanding kernel functions
10.3.4. Support vector machine and kernel methods takeaways
Summary
Part 3. Working in the real world
Chapter 11. Documentation and deployment
11.1. Predicting buzz
11.2. Using R markdown to produce milestone documentation
11.2.1. What is R markdown?
11.2.2. knitr technical details
11.2.3. Using knitr to document the Buzz data and produce the model
11.3. Using comments and version control for running documentation
11.3.1. Writing effective comments
11.3.2. Using version control to record history
11.3.3. Using version control to explore your project
11.3.4. Using version control to share work
11.4. Deploying models
11.4.1. Deploying demonstrations using Shiny
11.4.2. Deploying models as HTTP services
11.4.3. Deploying models by export
11.4.4. What to take away
Summary
Chapter 12. Producing effective presentations.
12.1. Presenting your results to the project sponsor
12.1.1. Summarizing the project's goals
12.1.2. Stating the project's results
12.1.3. Filling in the details
12.1.4. Making recommendations and discussing future work
12.1.5. Project sponsor presentation takeaways
12.2. Presenting your model to end users
12.2.1. Summarizing the project goals
12.2.2. Showing how the model fits user workflow
12.2.3. Showing how to use the model
12.2.4. End user presentation takeaways
12.3. Presenting your work to other data scientists
12.3.1. Introducing the problem
12.3.2. Discussing related work
12.3.3. Discussing your approach
12.3.4. Discussing results and future work
12.3.5. Peer presentation takeaways
Summary
Appendix A. Starting with R and other tools
A.1. Installing the tools
A.1.1. Installing Tools
A.1.2. The R package system
A.1.3. Installing Git
A.1.4. Installing RStudio
A.1.5. R resources
A.2. Starting with R
A.2.1. Primary features of R
A.2.2. Primary R data types
A.3. Using databases with R
A.3.1. Running database queries using a query generator
A.3.2. How to think relationally about data
A.4. The takeaway
Appendix B. Important statistical concepts
B.1. Distributions
B.1.1. Normal distribution
B.1.2. Summarizing R's distribution naming conventions
B.1.3. Lognormal distribution
B.1.4. Binomial distribution
B.1.5. More R tools for distributions
B.2. Statistical theory
B.2.1. Statistical philosophy
B.2.2. A/B tests
B.2.3. Power of tests
B.2.4. Specialized statistical tests
B.3. Examples of the statistical view of data
B.3.1. Sampling bias
B.3.2. Omitted variable bias
B.4. The takeaway
Appendix C. Bibliography
Practical Data Science with R
Index
List of Figures
List of Tables
List of Listings.

Practical Data Science with R, Second Edition /

Ejemplares similares