Spark : the definitive guide : big data processing made simple /
Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sec...
Clasificación: | Libro Electrónico |
---|---|
Autores principales: | , |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Sebastopol, CA :
O'Reilly Media,
[2018]
|
Edición: | First edition. |
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Part 1. Gentle overview of big data and Spark. What is Apache Spark?
- A gentle introduction to Spark
- A tour of Spark's toolset
- Part 2. Structured APIs : DataFrames, SQL, and datasets. Structured API overview
- Basic structured operations
- Working with different types of data
- Aggregations
- Joins
- Data sources
- Spark SQL
- Datasets
- Part 3. Low-level APIs. Resilient distributed datasets (RDDs)
- Advanced RDDs
- Distributed shared variables
- Part 4. Production applications. How Spark runs on a cluster
- Developint Spark applications
- Deploying Spark
- Monitoring and debugging
- Performance tuning
- Part 5. Streaming. Stream processing fundamentals
- Structured streaming basics
- Event-time and stateful processing
- Structured streaming in production
- Part 6. Advanced analytics and machine learning. Advanced analytics and machine learning overview
- Preprocessing and feature engineering
- Classification
- Regression
- Recommendation
- Unsupervised learning
- Graph analytics
- Deep learning
- Part 7. Ecosystem. Language specifics : Python (PySpark) and R (SparkR and sparklyr)
- Ecosystem and community.