Cargando…

Apache Spark Machine Learning Blueprints.

Develop a range of cutting-edge machine learning projects with Apache Spark using this actionable guideAbout This Book Customize Apache Spark and R to fit your analytical needs in customer research, fraud detection, risk analytics, and recommendation engine development Develop a set of practical Mac...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Liu, Alex
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Packt Publishing, 2016.
Edición:	1.
Temas:	Spark (Electronic resource : Apache Software Foundation) Machine learning. Big data. Information retrieval. Apprentissage automatique. Données volumineuses. Recherche de l'information. information retrieval. Big data Information retrieval Machine learning
Acceso en línea:	Texto completo

Tabla de Contenidos:

Cover; Copyright; Credits; About the Author; About the Reviewer; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Spark for Machine Learning; Spark overview and Spark advantages; Spark overview; Spark advantages; Spark computing for machine learning; Machine learning algorithms; MLlib; Other ML libraries; Spark RDD and dataframes; Spark RDD; Spark dataframes; Dataframes API for R; ML frameworks, RM4Es and Spark computing; ML frameworks; RM4Es; The Spark computing framework; ML workflows and Spark pipelines; ML as a step-by-step workflow; ML workflow examples; Spark notebooks.
Notebook approach for MLStep 1: Getting the software ready; Step 2: Installing the Knitr package; Step 3: Creating a simple report; Spark notebooks; Summary; Chapter 2: Data Preparation for Spark ML; Accessing and loading datasets; Accessing publicly available datasets; Loading datasets into Spark; Exploring and visualizing datasets; Data cleaning; Dealing with data incompleteness; Data cleaning in Spark; Data cleaning made easy; Identity matching; Identity issues; Identity matching on Spark; Entity resolution; Short string comparison; Long string comparison; Record deduplication.
Identity matching made betterCrowdsourced deduplication; Configuring the crowd; Using the crowd; Dataset reorganizing; Dataset reorganizing tasks; Dataset reorganizing with Spark SQL; Dataset reorganizing with R on Spark; Dataset joining; Dataset joining and its tool
the Spark SQL; Dataset joining in Spark; Dataset joining with the R data table package; Feature extraction; Feature development challenges; Feature development with Spark MLlib; Feature development with R; Repeatability and automation; Dataset preprocessing workflows; Spark pipelines for dataset preprocessing.
Dataset preprocessing automationSummary; Chapter 3: A Holistic View on Spark; Spark for a holistic view; The use case; Fast and easy computing; Methods for a holistic view; Regression modeling; The SEM approach; Decision trees; Feature preparation; PCA; Grouping by category to use subject knowledge; Feature selection; Model estimation; MLlib implementation; The R notebooks' implementation; Model evaluation; Quick evaluations; RMSE; ROC curves; Results explanation; Impact assessments; Deployment; Dashboard; Rules; Summary; Chapter 4: Fraud Detection on Spark; Spark for fraud detection.
The use caseDistributed computing; Methods for fraud detection; Random forest; Decision trees; Feature preparation; Feature extraction from LogFile; Data merging; Model estimation; MLlib implementation; R notebooks implementation; Model evaluation; A quick evaluation; Confusion matrix and false positive ratios; Results explanation; Big influencers and their impacts; Deploying fraud detection; Rules; Scoring; Summary; Chapter 5: Risk Scoring on Spark; Spark for risk scoring; The use case; Apache Spark notebooks; Methods of risk scoring; Logistic regression; Preparing coding in R.

Apache Spark Machine Learning Blueprints.

Ejemplares similares