Cargando…

Creating an extensible 100+ PB real-time big data platform by unifying storage and serving /

"Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. You'll get a behind-the-scenes look...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Formato:	Electrónico Video
Idioma:	Inglés
Publicado:	[Place of publication not identified] : O'Reilly Media, 2019.
Temas:	O'Reilly Strata Data Conference > (2019 : > New York, N.Y.) Real-time data processing. Application program interfaces (Computer software) Electronic data processing > Distributed processing. Application software > Development. Big data. Information storage and retrieval systems. Information Systems Temps réel (Informatique) Interfaces de programmation d'applications. Traitement réparti. Logiciels d'application > Développement. Données volumineuses. Systèmes d'information. APIs (interfaces) Application software > Development Big data Electronic data processing > Distributed processing Information storage and retrieval systems Real-time data processing
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

MARC


LEADER	00000cgm a2200000 i 4500
001	OR_on1177147929
003	OCoLC
005	20231017213018.0
006	m o c
007	cr cna\|\|\|\|\|\|\|\|
007	vz czazuu
008	200724s2019 xx 042 o vleng d
040			\|a UMI \|b eng \|e rda \|e pn \|c UMI \|d OCLCF \|d OCLCQ \|d OCLCO
029	1		\|a AU@ \|b 000071521818
035			\|a (OCoLC)1177147929
037			\|a CL0501000125 \|b Safari Books Online
050		4	\|a QA76.54
049			\|a UAMI
100	1		\|a Shiftehfar, Reza, \|e on-screen presenter.
245	1	0	\|a Creating an extensible 100+ PB real-time big data platform by unifying storage and serving / \|c Reza Shiftehfar.
246	3		\|a Creating an extensible one hundred plus petabyte real-time big data platform by unifying storage and serving
264		1	\|a [Place of publication not identified] : \|b O'Reilly Media, \|c 2019.
300			\|a 1 online resource (1 streaming video file (41 min., 49 sec.))
336			\|a two-dimensional moving image \|b tdi \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
337			\|a video \|b v \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
511	0		\|a Presenter, Reza Shiftehfar.
500			\|a Title from title screen (viewed July 24, 2020).
520			\|a "Reza Shiftehfar reflects on the challenges faced and proposes architectural solutions to scale a big data platform to ingest, store, and serve 100+ PB of data with minute-level latency while efficiently utilizing the hardware and meeting security needs. You'll get a behind-the-scenes look at the current big data technology landscape, including various existing open source technologies (e.g., Hadoop, Spark, Hive, Presto, Kafka, and Avro) as well as what Uber's tools such as Hudi and Marmaray. Hudi is an open source analytical storage system created at Uber to manage petabytes of data on HDFS-like distributed storage. Hudi provides near-real-time ingestion and provides different views of the data: a read-optimized view for batch analytics, a real-time view for driving dashboards, and an incremental view for powering data pipelines. Hudi also effectively manages files on underlying storage to maximize operational health and reliability. Reza details how Hudi lowers data latency across the board while simultaneously achieving orders of magnitude of efficiency over traditional batch ingestion. He then makes the case for near-real-time dashboards built on top of Hudi datasets, which can be cheaper than pure streaming architectures. Marmaray is an open source plug-in based pipeline platform connecting any arbitrary data source to any data sink. It allows unified and efficient ingestion of raw data from a variety of sources to Hadoop as well as the dispersal of the derived analysis result out of Hadoop to any online data store. Reza explains how Uber built and designed a common set of abstractions to handle both the ingestion and dispersal use cases, along with the challenges and lessons learned from developing the core library and setting up an on-demand self-service workflow. Along the way, you'll see how Uber scaled the platform to move around billions of records per day. You'll also dive into the technical aspects of how to rearchitect the ingestion platform to bring in 10+ trillion events per day at minute-level latency, how to scale the storage platform, and how to redesign the processing platform to efficiently serve millions of queries and jobs per day. You'll leave with greater insight into how things work in an extensible modern big data platform and inspired to reenvision your own data platform to make it more generic and flexible for future new requirements. This session is from the 2019 O'Reilly Strata Conference in New York, NY."--Resource description page
590			\|a O'Reilly \|b O'Reilly Online Learning: Academic/Public Library Edition
611	2	0	\|a O'Reilly Strata Data Conference \|d (2019 : \|c New York, N.Y.)
650		0	\|a Real-time data processing.
650		0	\|a Application program interfaces (Computer software)
650		0	\|a Electronic data processing \|x Distributed processing.
650		0	\|a Application software \|x Development.
650		0	\|a Big data.
650		0	\|a Information storage and retrieval systems.
650		2	\|a Information Systems
650		6	\|a Temps réel (Informatique)
650		6	\|a Interfaces de programmation d'applications.
650		6	\|a Traitement réparti.
650		6	\|a Logiciels d'application \|x Développement.
650		6	\|a Données volumineuses.
650		6	\|a Systèmes d'information.
650		7	\|a APIs (interfaces) \|2 aat
650		7	\|a Application program interfaces (Computer software) \|2 fast \|0 (OCoLC)fst00811704
650		7	\|a Application software \|x Development \|2 fast \|0 (OCoLC)fst00811707
650		7	\|a Big data \|2 fast \|0 (OCoLC)fst01892965
650		7	\|a Electronic data processing \|x Distributed processing \|2 fast \|0 (OCoLC)fst00906987
650		7	\|a Information storage and retrieval systems \|2 fast \|0 (OCoLC)fst00972781
650		7	\|a Real-time data processing \|2 fast \|0 (OCoLC)fst01091219
856	4	0	\|u https://learning.oreilly.com/videos/~/0636920372158/?ar \|z Texto completo (Requiere registro previo con correo institucional)
994			\|a 92 \|b IZTAP

Creating an extensible 100+ PB real-time big data platform by unifying storage and serving /

MARC

Ejemplares similares