Cargando…

Interactive Spark using PySpark /

Apache Spark is an in-memory framework that allows data scientists to explore and interact with big data much more quickly than with Hadoop. Python users can work with Spark using an interactive shell called PySpark. Why is it important? PySpark makes the large-scale data processing capabilities of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bengfort, Benjamin (Autor), Kim, Jenny (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	O'Reilly Media, Inc., 2016.
Edición:	1st edition
Temas:	Python (Computer program language) Python (Langage de programmation)
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

MARC


LEADER	00000cam a22000007i 4500
001	OR_on1019733704
003	OCoLC
005	20231017213018.0
006	m o d
007	cr \|n\|\|\|\|\|\|\|\|\|
008	180116s2016 xx o 000 0 eng
040			\|a UIU \|b eng \|e pn \|c UIU \|d OCLCO \|d OCLCF \|d C6I \|d OCLCQ \|d OCLCO \|d OCLCQ
020			\|a 9781491965313
020			\|a 1491965312
020			\|z 9781491966181
029	1		\|a AU@ \|b 000067114063
035			\|a (OCoLC)1019733704
049			\|a UAMI
100	1		\|a Bengfort, Benjamin, \|e author.
245	1	0	\|a Interactive Spark using PySpark / \|c Bengfort, Benjamin.
250			\|a 1st edition
264		1	\|b O'Reilly Media, Inc., \|c 2016.
300			\|a 1 online resource (20 pages)
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
520			\|a Apache Spark is an in-memory framework that allows data scientists to explore and interact with big data much more quickly than with Hadoop. Python users can work with Spark using an interactive shell called PySpark. Why is it important? PySpark makes the large-scale data processing capabilities of Apache Spark accessible to data scientists who are more familiar with Python than Scala or Java. This also allows for reuse of a wide variety of Python libraries for machine learning, data visualization, numerical analysis, etc. What you'll learn--and how you can apply it Compare the different components provided by Spark, and what use cases they fit. Learn how to use RDDs (resilient distributed datasets) with PySpark. Write Spark applications in Python and submit them to the cluster as Spark jobs. Get an introduction to the Spark computing framework. Apply this approach to a worked example to determine the most frequent airline delays in a specific month and year. This lesson is for you because ... You're a data scientist, familiar with Python coding, who needs to get up and running with PySpark You're a Python developer who needs to leverage the distributed computing resources available on a Hadoop cluster, without learning Java or Scala first Prerequisites Familiarity with writing Python applications Some familiarity with bash command-line operations Basic understanding of how to use simple functional programming constructs in Python, such as closures, lambdas, maps, etc. Materials or downloads needed in advance Apache Spark This lesson is taken from Data Analytics with Hadoop by Jenny Kim and Benjamin Bengfort.
590			\|a O'Reilly \|b O'Reilly Online Learning: Academic/Public Library Edition
650		0	\|a Python (Computer program language)
650		6	\|a Python (Langage de programmation)
650		7	\|a Python (Computer program language) \|2 fast \|0 (OCoLC)fst01084736
700	1		\|a Kim, Jenny, \|e author.
856	4	0	\|u https://learning.oreilly.com/library/view/~/9781491965313/?ar \|z Texto completo (Requiere registro previo con correo institucional)
994			\|a 92 \|b IZTAP

Interactive Spark using PySpark /

MARC

Ejemplares similares