Cargando…

Data Mining and Predictive Analytics

Detalles Bibliográficos
Autor principal: Larose, Daniel T.
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Newark : John Wiley & Sons, Incorporated, 2015.
Colección:New York Academy of Sciences Ser.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover
  • Contents
  • Preface
  • Acknowledgments
  • Part I Data Preparation
  • Chapter 1 An Introduction to Data Mining and Predictive Analytics
  • 1.1 What is Data Mining? What is Predictive Analytics?
  • 1.2 Wanted: Data Miners
  • 1.3 The Need for Human Direction of Data Mining
  • 1.4 The Cross-Industry Standard Process for Data Mining: CRISP-DM
  • 1.4.1 CRISP-DM: The Six Phases
  • 1.5 Fallacies of Data Mining
  • 1.6 What Tasks Can Data Mining Accomplish
  • 1.6.1 Description
  • 1.6.2 Estimation
  • 1.6.3 Prediction
  • 1.6.4 Classification
  • 1.6.5 Clustering
  • 1.6.6 Association
  • The R Zone
  • R References
  • Exercises
  • Chapter 2 Data Preprocessing
  • 2.1 Why do We Need to Preprocess the Data?
  • 2.2 Data Cleaning
  • 2.3 Handling Missing Data
  • 2.4 Identifying Misclassifications
  • 2.5 Graphical Methods for Identifying Outliers
  • 2.6 Measures of Center and Spread
  • 2.7 Data Transformation
  • 2.8 Min-Max Normalization
  • 2.9 Z-Score Standardization
  • 2.10 Decimal Scaling
  • 2.11 Transformations to Achieve Normality
  • 2.12 Numerical Methods for Identifying Outliers
  • 2.13 Flag Variables
  • 2.14 Transforming Categorical Variables into Numerical Variables
  • 2.15 Binning Numerical Variables
  • 2.16 Reclassifying Categorical Variables
  • 2.17 Adding an Index Field
  • 2.18 Removing Variables that are not Useful
  • 2.19 Variables that Should Probably not be Removed
  • 2.20 Removal of Duplicate Records
  • 2.21 A Word About ID Fields
  • The R Zone
  • R Reference
  • Exercises
  • Chapter 3 Exploratory Data Analysis
  • 3.1 Hypothesis Testing Versus Exploratory Data Analysis
  • 3.2 Getting to Know the Data Set
  • 3.3 Exploring Categorical Variables
  • 3.4 Exploring Numeric Variables
  • 3.5 Exploring Multivariate Relationships
  • 3.6 Selecting Interesting Subsets of the Data for Further Investigation
  • 3.7 Using EDA to Uncover Anomalous Fields
  • 3.8 Binning Based on Predictive Value
  • 3.9 Deriving New Variables: Flag Variables
  • 3.10 Deriving New Variables: Numerical Variables
  • 3.11 Using EDA to Investigate Correlated Predictor Variables
  • 3.12 Summary of Our EDA
  • The R Zone
  • R References
  • Exercises
  • Chapter 4 Dimension-Reduction Methods
  • 4.1 Need for Dimension-Reduction in Data Mining
  • 4.2 Principal Components Analysis
  • 4.3 Applying PCA to the Houses Data Set
  • 4.4 How Many Components Should We Extract?
  • 4.4.1 The Eigenvalue Criterion
  • 4.4.2 The Proportion of Variance Explained Criterion
  • 4.4.3 The Minimum Communality Criterion
  • 4.4.4 The Scree Plot Criterion
  • 4.5 Profiling the Principal Components
  • 4.6 Communalities
  • 4.6.1 Minimum Communality Criterion
  • 4.7 Validation of the Principal Components
  • 4.8 Factor Analysis
  • 4.9 Applying Factor Analysis to the Adult Data Set
  • 4.10 Factor Rotation
  • 4.11 User-Defined Composites
  • 4.12 An Example of a User-Defined Composite
  • The R Zone
  • R References
  • Exercises