Data Mining and Predictive Analytics
Autor principal: | |
---|---|
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Newark :
John Wiley & Sons, Incorporated,
2015.
|
Colección: | New York Academy of Sciences Ser.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover
- Contents
- Preface
- Acknowledgments
- Part I Data Preparation
- Chapter 1 An Introduction to Data Mining and Predictive Analytics
- 1.1 What is Data Mining? What is Predictive Analytics?
- 1.2 Wanted: Data Miners
- 1.3 The Need for Human Direction of Data Mining
- 1.4 The Cross-Industry Standard Process for Data Mining: CRISP-DM
- 1.4.1 CRISP-DM: The Six Phases
- 1.5 Fallacies of Data Mining
- 1.6 What Tasks Can Data Mining Accomplish
- 1.6.1 Description
- 1.6.2 Estimation
- 1.6.3 Prediction
- 1.6.4 Classification
- 1.6.5 Clustering
- 1.6.6 Association
- The R Zone
- R References
- Exercises
- Chapter 2 Data Preprocessing
- 2.1 Why do We Need to Preprocess the Data?
- 2.2 Data Cleaning
- 2.3 Handling Missing Data
- 2.4 Identifying Misclassifications
- 2.5 Graphical Methods for Identifying Outliers
- 2.6 Measures of Center and Spread
- 2.7 Data Transformation
- 2.8 Min-Max Normalization
- 2.9 Z-Score Standardization
- 2.10 Decimal Scaling
- 2.11 Transformations to Achieve Normality
- 2.12 Numerical Methods for Identifying Outliers
- 2.13 Flag Variables
- 2.14 Transforming Categorical Variables into Numerical Variables
- 2.15 Binning Numerical Variables
- 2.16 Reclassifying Categorical Variables
- 2.17 Adding an Index Field
- 2.18 Removing Variables that are not Useful
- 2.19 Variables that Should Probably not be Removed
- 2.20 Removal of Duplicate Records
- 2.21 A Word About ID Fields
- The R Zone
- R Reference
- Exercises
- Chapter 3 Exploratory Data Analysis
- 3.1 Hypothesis Testing Versus Exploratory Data Analysis
- 3.2 Getting to Know the Data Set
- 3.3 Exploring Categorical Variables
- 3.4 Exploring Numeric Variables
- 3.5 Exploring Multivariate Relationships
- 3.6 Selecting Interesting Subsets of the Data for Further Investigation
- 3.7 Using EDA to Uncover Anomalous Fields
- 3.8 Binning Based on Predictive Value
- 3.9 Deriving New Variables: Flag Variables
- 3.10 Deriving New Variables: Numerical Variables
- 3.11 Using EDA to Investigate Correlated Predictor Variables
- 3.12 Summary of Our EDA
- The R Zone
- R References
- Exercises
- Chapter 4 Dimension-Reduction Methods
- 4.1 Need for Dimension-Reduction in Data Mining
- 4.2 Principal Components Analysis
- 4.3 Applying PCA to the Houses Data Set
- 4.4 How Many Components Should We Extract?
- 4.4.1 The Eigenvalue Criterion
- 4.4.2 The Proportion of Variance Explained Criterion
- 4.4.3 The Minimum Communality Criterion
- 4.4.4 The Scree Plot Criterion
- 4.5 Profiling the Principal Components
- 4.6 Communalities
- 4.6.1 Minimum Communality Criterion
- 4.7 Validation of the Principal Components
- 4.8 Factor Analysis
- 4.9 Applying Factor Analysis to the Adult Data Set
- 4.10 Factor Rotation
- 4.11 User-Defined Composites
- 4.12 An Example of a User-Defined Composite
- The R Zone
- R References
- Exercises