Cargando…

Spark : the definitive guide : big data processing made simple /

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sec...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autores principales:	Chambers, Bill (William Andrew) (Autor), Zaharia, Matei (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	Sebastopol, CA : O'Reilly Media, [2018]
Edición:	First edition.
Temas:	Spark (Electronic resource : Apache Software Foundation) Data mining. Information retrieval. Big data. Exploration de données (Informatique) Recherche de l'information. Données volumineuses. information retrieval. COMPUTERS > Computer Literacy. COMPUTERS > Computer Science. COMPUTERS > Data Processing. COMPUTERS > Hardware > General. COMPUTERS > Information Technology. COMPUTERS > Machine Theory. COMPUTERS > Reference.
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

Tabla de Contenidos:

Part 1. Gentle overview of big data and Spark. What is Apache Spark?
A gentle introduction to Spark
A tour of Spark's toolset
Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview
Basic structured operations
Working with different types of data
Aggregations
Joins
Data sources
Spark SQL
Datasets
Part 3. Low-level APIs. Resilient distributed datasets (RDDs)
Advanced RDDs
Distributed shared variables
Part 4. Production applications. How Spark runs on a cluster
Developint Spark applications
Deploying Spark
Monitoring and debugging
Performance tuning
Part 5. Streaming. Stream processing fundamentals
Structured streaming basics
Event-time and stateful processing
Structured streaming in production
Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview
Preprocessing and feature engineering
Classification
Regression
Recommendation
Unsupervised learning
Graph analytics
Deep learning
Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr)
Ecosystem and community.

Spark : the definitive guide : big data processing made simple /

Ejemplares similares