Cargando…

Spark cookbook : over 60 recipes on Spark, covering Spark Core, Spark SQL, Spark Streaming, MLib, and GraphX libraries /

If you are a data engineer, an application developer, or a data scientist who would like to leverage the power of Apache Spark to get better insights from big data, then this is the book for you.

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Yadav, Rishi (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing, 2015.
Colección:Quick answers to common problems.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Getting Started with Apache Spark
  • Introduction
  • Installing Spark from binaries
  • Building the Spark source code with Maven
  • Launching Spark on Amazon EC2
  • Deploying on a cluster in standalone mode
  • Deploying on a cluster with Mesos
  • Deploying on a cluster with YARN
  • Using Tachyon as an off-heap storage layer
  • Chapter 2: Developing Applications with Spark
  • Introduction
  • Exploring the Spark shellDeveloping Spark applications in Eclipse with Maven
  • Developing Spark applications in Eclipse with SBT
  • Developing a Spark application in IntelliJ IDEA with Maven
  • Developing a Spark application in IntelliJ IDEA with SBT
  • Chapter 3: External Data Sources
  • Introduction
  • Loading data from the local filesystem
  • Loading data from HDFS
  • Loading data from HDFS using a custom InputFormat
  • Loading data from Amazon S3
  • Loading data from Apache Cassandra
  • Loading data from relational databases
  • Chapter 4: Spark SQL
  • IntroductionUnderstanding Catalyst optimizer
  • Creating HiveContext
  • Inferring schema using case classes
  • Programmatically specifying the schema
  • Loading and saving data using the Parquet format
  • Loading and saving data using the JSON format
  • Loading and saving data from relational databases
  • Loading and saving data from an arbitrary source
  • Chapter 5: Spark Streaming
  • Introduction
  • Word count using Streaming
  • Streaming Twitter data
  • Streaming using Kafka
  • Chapter 6: Getting Started with Machine Learning using MLlib
  • Introduction
  • Creating vectorsCreating a labeled point
  • Creating matrices
  • Calculating summary statistics
  • Calculating correlation
  • Doing hypothesis testing
  • Creating machine learning pipelines using ML
  • Chapter 7: Supervised Learning with MLlib Regression
  • Introduction
  • Using linear regression
  • Understanding cost function
  • Doing linear regression with lasso
  • Doing ridge regression
  • Chapter 8: Supervised Learning with MLlib â€? Classification
  • Introduction
  • Doing classification using logistic regression
  • Doing binary classification using SVMDoing classification using decision trees
  • Doing classification using Random Forests
  • Doing classification using Gradient Boosted Trees
  • Doing classification with NaÃv̄e Bayes
  • Chapter 9: Unsupervised Learning
  • Introduction
  • Clustering using k-means
  • Dimensionality reduction with principal component analysis
  • Dimensionality reduction with singular value decomposition
  • Chapter 10: Recommender Systems
  • Introduction
  • Collaborative filtering using explicit feedback