Cargando…

Disrupting data discovery

Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common wa...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Grover, Mark (Autor), Feng, Tao (Autor)
Autor Corporativo: Safari, an O'Reilly Media Company
Formato: Electrónico Video
Idioma:Inglés
Publicado: O'Reilly Media, Inc., 2019.
Edición:1st edition.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cgm a22000007a 4500
001 OR_on1129470898
003 OCoLC
005 20231017213018.0
006 m o c
007 cr cnu||||||||
007 vz czazuu
008 191204s2019 xx 042 vleng
040 |a AU@  |b eng  |c AU@  |d UMI  |d OCLCO  |d OCLCF  |d OCLCQ  |d TOH  |d UAB  |d OCLCO  |d FZL  |d OCLCQ 
019 |a 1137352838  |a 1142794455  |a 1163570075  |a 1224594248  |a 1232114304  |a 1304266596  |a 1304350143  |a 1351600817  |a 1380765069 
020 |z 0636920340003 
024 8 |a 0636920340027 
029 0 |a AU@  |b 000066261580 
035 |a (OCoLC)1129470898  |z (OCoLC)1137352838  |z (OCoLC)1142794455  |z (OCoLC)1163570075  |z (OCoLC)1224594248  |z (OCoLC)1232114304  |z (OCoLC)1304266596  |z (OCoLC)1304350143  |z (OCoLC)1351600817  |z (OCoLC)1380765069 
037 |a CL0501000091  |b Safari Books Online 
050 4 |a HD30.37 
082 0 4 |a E VIDEO 
049 |a UAMI 
100 1 |a Grover, Mark,  |e author. 
245 1 0 |a Disrupting data discovery  |h [electronic resource] /  |c Grover, Mark. 
250 |a 1st edition. 
264 1 |b O'Reilly Media, Inc.,  |c 2019. 
300 |a 1 online resource (1 video file, approximately 42 min.) 
336 |a two-dimensional moving image  |b tdi  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
344 |a digital  |2 rdatr 
347 |a video file 
520 |a Before any analysis can begin, a data scientist needs to discover the right data sources to analyze, understand them, and determine whether they can trust them. Unfortunately, data discovery is very inefficient today. Countless hours are lost trying to find the right data to use. (The most common way still remains to ask a coworker.) Gaining trust in data requires running a bunch of queries (max timestamp, counts per day, count distincts, etc.) that waste time and add unnecessary load on the databases. There's no clear way to know how to find folks to answer questions about the table. And worst of all, many times analysis is redone and models are rebuilt because previous work isn't discoverable. Lyft has reduced the time it takes to discover data by 10x by building its own data portal, Amundsen. Amundsen is built on three key pillars: an augmented data graph, an intuitive user experience, and centralized metadata. Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that it treats people as a first-class data asset; in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards). In addition, Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet. Finally, Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress. Mark Grover and Tao Feng (Lyft) offer a demo of Amundsen and lead a deep dive into its architecture, covering how it leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal. They also explore the future roadmap, unsolved problems, and its collaboration model. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco. 
538 |a Mode of access: World Wide Web. 
542 |f Copyright © O'Reilly Media, Inc. 
550 |a Made available through: Safari, an O'Reilly Media Company. 
588 |a Online resource; Title from title screen (viewed October 31, 2019) 
511 0 |a Presenters, Mark Grover, Tao Feng. 
533 |a Electronic reproduction.  |b Boston, MA :  |c Safari.  |n Available via World Wide Web. 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
655 4 |a Electronic videos. 
700 1 |a Feng, Tao,  |e author. 
710 2 |a Safari, an O'Reilly Media Company. 
856 4 0 |u https://learning.oreilly.com/videos/~/0636920340027/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
936 |a BATCHLOAD 
994 |a 92  |b IZTAP