Cargando…

Beginning Apache Spark 3 : with DataFrame, Spark SQL, structured streaming, and Spark machine learning library /

Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and st...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Luu, Hien (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	New York : Apress, 2021.
Edición:	Second edition.
Temas:	Spark (Electronic resource : Apache Software Foundation) Big data. Distributed databases. Open source software. Machine learning. Données volumineuses. Bases de données réparties. Logiciels libres. Apprentissage automatique. Distributed databases Big data Machine learning Open source software
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

Tabla de Contenidos:

Intro
Table of Contents
About the Author
About the Technical Reviewers
Acknowledgments
Introduction
Chapter 1: Introduction to Apache Spark
Overview
History
Spark Core Concepts and Architecture
Spark Cluster and Resource Management System
Spark Applications
Spark Drivers and Executors
Spark Unified Stack
Spark Core
Spark SQL
Spark Structured Streaming
Spark MLlib
Spark GraphX
SparkR
Apache Spark 3.0
Adaptive Query Execution Framework
Dynamic Partition Pruning (DPP)
Accelerator-aware Scheduler
Apache Spark Applications
Spark Example Applications
Apache Spark Ecosystem
Delta Lake
Koalas
MLflow
Summary
Chapter 2: Working with Apache Spark
Downloading and Installation
Downloading Spark
Installing Spark
Spark Scala Shell
Spark Python Shell
Having Fun with the Spark Scala Shell
Useful Spark Scala Shell Command and Tips
Basic Interactions with Scala and Spark
Basic Interactions with Scala
Spark UI and Basic Interactions with Spark
Spark UI
Basic Interactions with Spark
Introduction to Collaborative Notebooks
Create a Cluster
Create a Folder
Create a Notebook
Setting up Spark Source Code
Summary
Chapter 3: Spark SQL: Foundation
Understanding RDD
Introduction to the DataFrame API
Creating a DataFrame
Creating a DataFrame from RDD
Creating a DataFrame from a Range of Numbers
Creating a DataFrame from Data Sources
Creating a DataFrame by Reading Text Files
Creating a DataFrame by Reading CSV Files
Creating a DataFrame by Reading JSON Files
Creating a DataFrame by Reading Parquet Files
Creating a DataFrame by Reading ORC Files
Creating a DataFrame from JDBC
Working with Structured Operations
Working with Columns
Working with Structured Transformations
select(columns)
selectExpr(expressions)
filler(condition), where(condition)
distinct, dropDuplicates
sort(columns), orderBy(columns)
limit(n)
union(otherDataFrame)
withColumn(colName, column)
withColumnRenamed(existingColName, newColName)
drop(columnName1, columnName2)
sample(fraction), sample(fraction, seed), sample(fraction, seed, withReplacement)
randomSplit(weights)
Working with Missing or Bad Data
Working with Structured Actions
Describe(columnNames)
Introduction to Datasets
Creating Datasets
Working with Datasets
Using SQL in Spark SQL
Running SQL in Spark
Writing Data Out to Storage Systems
The Trio: DataFrame, Dataset, and SQL
DataFrame Persistence
Summary
Chapter 4: Spark SQL: Advanced
Aggregations
Aggregation Functions
Common Aggregation Functions
count(col)
countDistinct(col)
min(col), max(col)
sum(col)
sumDistinct(col)
avg(col)
skewness(col), kurtosis(col)
variance(col), stddev(col)
Aggregation with Grouping
Multiple Aggregations per Group

Beginning Apache Spark 3 : with DataFrame, Spark SQL, structured streaming, and Spark machine learning library /

Ejemplares similares