Spark cookbook : over 60 recipes on Spark, covering Spark Core, Spark SQL, Spark Streaming, MLib, and GraphX libraries /
If you are a data engineer, an application developer, or a data scientist who would like to leverage the power of Apache Spark to get better insights from big data, then this is the book for you.
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing,
2015.
|
Colección: | Quick answers to common problems.
|
Temas: | |
Acceso en línea: | Texto completo Texto completo |
Tabla de Contenidos:
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: Getting Started with Apache Spark
- Introduction
- Installing Spark from binaries
- Building the Spark source code with Maven
- Launching Spark on Amazon EC2
- Deploying on a cluster in standalone mode
- Deploying on a cluster with Mesos
- Deploying on a cluster with YARN
- Using Tachyon as an off-heap storage layer
- Chapter 2: Developing Applications with Spark
- Introduction
- Exploring the Spark shellDeveloping Spark applications in Eclipse with Maven
- Developing Spark applications in Eclipse with SBT
- Developing a Spark application in IntelliJ IDEA with Maven
- Developing a Spark application in IntelliJ IDEA with SBT
- Chapter 3: External Data Sources
- Introduction
- Loading data from the local filesystem
- Loading data from HDFS
- Loading data from HDFS using a custom InputFormat
- Loading data from Amazon S3
- Loading data from Apache Cassandra
- Loading data from relational databases
- Chapter 4: Spark SQL
- IntroductionUnderstanding Catalyst optimizer
- Creating HiveContext
- Inferring schema using case classes
- Programmatically specifying the schema
- Loading and saving data using the Parquet format
- Loading and saving data using the JSON format
- Loading and saving data from relational databases
- Loading and saving data from an arbitrary source
- Chapter 5: Spark Streaming
- Introduction
- Word count using Streaming
- Streaming Twitter data
- Streaming using Kafka
- Chapter 6: Getting Started with Machine Learning using MLlib
- Introduction
- Creating vectorsCreating a labeled point
- Creating matrices
- Calculating summary statistics
- Calculating correlation
- Doing hypothesis testing
- Creating machine learning pipelines using ML
- Chapter 7: Supervised Learning with MLlib Regression
- Introduction
- Using linear regression
- Understanding cost function
- Doing linear regression with lasso
- Doing ridge regression
- Chapter 8: Supervised Learning with MLlib � Classification
- Introduction
- Doing classification using logistic regression
- Doing binary classification using SVMDoing classification using decision trees
- Doing classification using Random Forests
- Doing classification using Gradient Boosted Trees
- Doing classification with NaÃv̄e Bayes
- Chapter 9: Unsupervised Learning
- Introduction
- Clustering using k-means
- Dimensionality reduction with principal component analysis
- Dimensionality reduction with singular value decomposition
- Chapter 10: Recommender Systems
- Introduction
- Collaborative filtering using explicit feedback