Data mining for the social sciences : an introduction /
"We live, today, in world of big data. The amount of information collected on human behavior every day is staggering, and exponentially greater than at any time in the past. At the same time, we are inundated by stories of powerful algorithms capable of churning through this sea of data and unc...
Clasificación: | Libro Electrónico |
---|---|
Autores principales: | , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Oakland, California :
University of California Press,
[2015]
|
Edición: | First edition. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Title; Copyright; Contents; Acknowledgments; PART 1. CONCEPTS; 1. What Is Data Mining?; The Goals of This Book; Software and Hardware for Data Mining; Basic Terminology; 2. Contrasts with the Conventional Statistical Approach; Predictive Power in Conventional Statistical Modeling; Hypothesis Testing in the Conventional Approach; Heteroscedasticity as a Threat to Validity in Conventional Modeling; The Challenge of Complex and Nonrandom Samples; Bootstrapping and Permutation Tests; Nonlinearity in Conventional Predictive Models; Statistical Interactions in Conventional Models; Conclusion.
- 3. Some General Strategies Used in Data MiningCross-Validation; Overfitting; Boosting; Calibrating; Measuring Fit: The Confusion Matrix and ROC Curves; Identifying Statistical Interactions and Effect Heterogeneity in Data Mining; Bagging and Random Forests; The Limits of Prediction; Big Data Is Never Big Enough; 4. Important Stages in a Data Mining Project; When to Sample Big Data; Building a Rich Array of Features; Feature Selection; Feature Extraction; Constructing a Model; PART 2. WORKED EXAMPLES; 5. Preparing Training and Test Datasets ; The Logic of Cross-Validation.
- Cross-Validation Methods: An Overview6. Variable Selection Tools; Stepwise Regression; The LASSO; VIF Regression; 7. Creating New Variables Using Binning and Trees; Discretizing a Continuous Predictor; Continuous Outcomes and Continuous Predictors; Binning Categorical Predictors; Using Partition Trees to Study Interactions; 8. Extracting Variables; Principal Component Analysis; Independent Component Analysis; 9. Classifiers; K-Nearest Neighbors; Naive Bayes; Support Vector Machines; Optimizing Prediction across Multiple Classifiers; 10. Classification Trees; Partition Trees.
- Boosted Trees and Random Forests 11. Neural Networks; 12. Clustering; Hierarchical Clustering; K-Means Clustering; Normal Mixtures; Self-Organized Maps; 13. Latent Class Analysis and Mixture Models; Latent Class Analysis; Latent Class Regression; Mixture Models; 14. Association Rules; Conclusion; Bibliography; Notes; Index; A; B; C; D; E; F; G; H; I; J; K; L; M; N; O; P; R; S; T; U; V; W; X; Y; Z.