Apache Spark Machine Learning Blueprints.
Develop a range of cutting-edge machine learning projects with Apache Spark using this actionable guideAbout This Book Customize Apache Spark and R to fit your analytical needs in customer research, fraud detection, risk analytics, and recommendation engine development Develop a set of practical Mac...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Packt Publishing,
2016.
|
Edición: | 1. |
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- Cover; Copyright; Credits; About the Author; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Spark for Machine Learning; Spark overview and Spark advantages; Spark overview; Spark advantages; Spark computing for machine learning; Machine learning algorithms; MLlib; Other ML libraries; Spark RDD and dataframes; Spark RDD; Spark dataframes; Dataframes API for R; ML frameworks, RM4Es and Spark computing; ML frameworks; RM4Es; The Spark computing framework; ML workflows and Spark pipelines; ML as a step-by-step workflow; ML workflow examples; Spark notebooks.
- Notebook approach for MLStep 1: Getting the software ready; Step 2: Installing the Knitr package; Step 3: Creating a simple report; Spark notebooks; Summary; Chapter 2: Data Preparation for Spark ML; Accessing and loading datasets; Accessing publicly available datasets; Loading datasets into Spark; Exploring and visualizing datasets; Data cleaning; Dealing with data incompleteness; Data cleaning in Spark; Data cleaning made easy; Identity matching; Identity issues; Identity matching on Spark; Entity resolution; Short string comparison; Long string comparison; Record deduplication.
- Identity matching made betterCrowdsourced deduplication; Configuring the crowd; Using the crowd; Dataset reorganizing; Dataset reorganizing tasks; Dataset reorganizing with Spark SQL; Dataset reorganizing with R on Spark; Dataset joining; Dataset joining and its tool
- the Spark SQL; Dataset joining in Spark; Dataset joining with the R data table package; Feature extraction; Feature development challenges; Feature development with Spark MLlib; Feature development with R; Repeatability and automation; Dataset preprocessing workflows; Spark pipelines for dataset preprocessing.
- Dataset preprocessing automationSummary; Chapter 3: A Holistic View on Spark; Spark for a holistic view; The use case; Fast and easy computing; Methods for a holistic view; Regression modeling; The SEM approach; Decision trees; Feature preparation; PCA; Grouping by category to use subject knowledge; Feature selection; Model estimation; MLlib implementation; The R notebooks' implementation; Model evaluation; Quick evaluations; RMSE; ROC curves; Results explanation; Impact assessments; Deployment; Dashboard; Rules; Summary; Chapter 4: Fraud Detection on Spark; Spark for fraud detection.
- The use caseDistributed computing; Methods for fraud detection; Random forest; Decision trees; Feature preparation; Feature extraction from LogFile; Data merging; Model estimation; MLlib implementation; R notebooks implementation; Model evaluation; A quick evaluation; Confusion matrix and false positive ratios; Results explanation; Big influencers and their impacts; Deploying fraud detection; Rules; Scoring; Summary; Chapter 5: Risk Scoring on Spark; Spark for risk scoring; The use case; Apache Spark notebooks; Methods of risk scoring; Logistic regression; Preparing coding in R.