Cargando…

Data munging with Hadoop /

The Example-Rich, Hands-On Guide to Data Munging with Apache Hadoop TM Data scientists spend much of their time “munging” data: handling day-to-day tasks such as data cleansing, normalization, aggregation, sampling, and transformation. These tasks are both critical and surprisingly interesting. Most...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autores principales: Mendelevitch, Ofer (Autor), Stella, Casey (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Boston : Addison-Wesley, 2015.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a2200000Ii 4500
001 OR_ocn931716562
003 OCoLC
005 20231017213018.0
006 m o d
007 cr unu||||||||
008 151208t20152016mau o 000 0 eng d
040 |a UMI  |b eng  |e rda  |e pn  |c UMI  |d OCLCF  |d DEBBG  |d DEBSZ  |d VT2  |d CEF  |d OCLCQ  |d OCLCO  |d WYU  |d CNCEN  |d OCLCO  |d UKAHL  |d OCLCQ 
019 |a 948566932 
020 |a 9780134435534 
020 |a 0134435532 
020 |a 0134435486 
020 |a 9780134435480 
020 |z 9780134435480 
029 1 |a DEBBG  |b BV043968181 
029 1 |a DEBSZ  |b 485786729 
029 1 |a GBVCP  |b 876243715 
035 |a (OCoLC)931716562  |z (OCoLC)948566932 
037 |a CL0500000684  |b Safari Books Online 
050 4 |a QA76.9.D5 
049 |a UAMI 
100 1 |a Mendelevitch, Ofer,  |e author. 
245 1 0 |a Data munging with Hadoop /  |c Ofer Mendelevitch, Casey Stella. 
264 1 |a Boston :  |b Addison-Wesley,  |c 2015. 
264 4 |c ©2016 
300 |a 1 online resource (1 volume) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
588 0 |a Online resource; title from title page (Safari, viewed December 7, 2015). 
520 |a The Example-Rich, Hands-On Guide to Data Munging with Apache Hadoop TM Data scientists spend much of their time “munging” data: handling day-to-day tasks such as data cleansing, normalization, aggregation, sampling, and transformation. These tasks are both critical and surprisingly interesting. Most important, they deepen your understanding of your data's structure and limitations: crucial insight for improving accuracy and mitigating risk in any analytical project. Now, two leading Hortonworks data scientists, Ofer Mendelevitch and Casey Stella, bring together powerful, practical insights for effective Hadoop-based data munging of large datasets. Drawing on extensive experience with advanced analytics, the authors offer realistic examples that address the common issues you're most likely to face. They describe each task in detail, presenting example code based on widely used tools such as Pig, Hive, and Spark. This concise, hands-on eBook is valuable for every data scientist, data engineer, and architect who wants to master data munging: not just in theory, but in practice with the field's #1 platform–Hadoop. Coverage includes A framework for understanding the various types of data quality checks, including cell-based rules, distribution validation, and outlier analysis Assessing tradeoffs in common approaches to imputing missing values Implementing quality checks with Pig or Hive UDFs Transforming raw data into “feature matrix” format for machine learning algorithms Choosing features and instances Implementing text features via “bag-of-words” and NLP techniques Handling time-series data via frequency- or time-domain methods Manipulating feature values to prepare for modeling Data Munging with Hadoop is part of a larger, forthcoming work entitled Data Science Using Hadoop . To be notified when the larger work is available, register your purchase of Data Munging with Hadoop at informit.com/register and check the box “I would like to hear from InformIT and its family of brands about products and special offers.” 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
630 0 0 |a Apache Hadoop. 
630 0 7 |a Apache Hadoop.  |2 fast  |0 (OCoLC)fst01911570 
650 0 |a Data mining. 
650 0 |a Data structures (Computer science) 
650 0 |a Data transmission systems. 
650 2 |a Data Mining 
650 6 |a Exploration de données (Informatique) 
650 6 |a Structures de données (Informatique) 
650 7 |a Data mining.  |2 fast  |0 (OCoLC)fst00887946 
650 7 |a Data structures (Computer science)  |2 fast  |0 (OCoLC)fst00887978 
650 7 |a Data transmission systems.  |2 fast  |0 (OCoLC)fst00887993 
700 1 |a Stella, Casey,  |e author. 
856 4 0 |u https://learning.oreilly.com/library/view/~/9780134435534/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
938 |a Askews and Holts Library Services  |b ASKH  |n AH37828494 
994 |a 92  |b IZTAP