Cargando…

Cleaning data for effective data science : doing the other 80% of the work with Python, R, and command-line tools /

A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series...

Descripción completa

Detalles Bibliográficos
Clasificación:	Libro Electrónico
Autor principal:	Mertz, David (Autor)
Formato:	Electrónico eBook
Idioma:	Inglés
Publicado:	[S.l.] : Packt Publishing Limited, 2021.
Temas:	Python (Programming language). Computational biology > Methods. Database management. Data integrity. Python (Computer program language) R (Computer program language) Computational Biology > methods Data Analysis Data Accuracy Bases de données > Gestion. Intégrité des données. Python (Langage de programmation) R (Langage de programmation) Qualité des données. Database design & theory. Data capture & analysis. Mathematical theory of computation. Machine learning. Information architecture. Computers > Data Processing. Computers > Machine Theory. Computers > Data Modeling & Design. Computational biology. Fulltext. Internet Resources. Methods (Music)
Acceso en línea:	Texto completo (Requiere registro previo con correo institucional)

MARC


LEADER	00000cam a2200000Mi 4500
001	OR_on1244742334
003	OCoLC
005	20231017213018.0
006	m o d
007	cr \|n\|\|\|\|\|\|\|\|\|
008	210404s2021 xx o 00\| 0 eng d
040			\|a YDX \|b eng \|c YDX \|d CASUM \|d OCLCO \|d N$T \|d OCLCO \|d NLW \|d OCLCF \|d OCLCO \|d OCLCQ \|d IEEEE
019			\|a 1394053423
020			\|a 9781801074407 \|q (electronic bk.)
020			\|a 1801074402 \|q (electronic bk.)
020			\|z 1801071292
020			\|z 9781801071291
029	1		\|a AU@ \|b 000068941303
035			\|a (OCoLC)1244742334 \|z (OCoLC)1394053423
037			\|a 10162982 \|b IEEE
050		4	\|a QA76.9.D345
082	0	4	\|a 005.7 \|2 23
049			\|a UAMI
100	1		\|a Mertz, David, \|e author.
245	1	0	\|a Cleaning data for effective data science : \|b doing the other 80% of the work with Python, R, and command-line tools / \|c David Mertz.
264		1	\|a [S.l.] : \|b Packt Publishing Limited, \|c 2021.
300			\|a 1 online resource
336			\|a text \|b txt \|2 rdacontent
337			\|a computer \|b c \|2 rdamedia
338			\|a online resource \|b cr \|2 rdacarrier
520			\|a A comprehensive guide for data scientists to master effective data cleaning tools and techniques Key Features Master data cleaning techniques in a language-agnostic manner Learn from intriguing hands-on examples from numerous domains, such as biology, weather data, demographics, physics, time series, and image processing Work with detailed, commented, well-tested code samples in Python and R Book Description It is something of a truism in data science, data analysis, or machine learning that most of the effort needed to achieve your actual purpose lies in cleaning your data. Written in David's signature friendly and humorous style, this book discusses in detail the essential steps performed in every production data science or data analysis pipeline and prepares you for data visualization and modeling results. The book dives into the practical application of tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. It also offers long-form exercises at the end of each chapter to practice the skills acquired. You will begin by looking at data ingestion of data formats such as JSON, CSV, SQL RDBMSes, HDF5, NoSQL databases, files in image formats, and binary serialized data structures. Further, the book provides numerous example data sets and data files, which are available for download and independent exploration. Moving on from formats, you will impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this book, you will have acquired a firm understanding of the data cleaning process necessary to perform real-world data science and machine learning tasks. What you will learn How to think carefully about your data and ask the right questions Identify problem data pertaining to individual data points Detect problem data in the systematic "shape" of the data Remediate data integrity and hygiene problems Prepare data for analytic and machine learning tasks Impute values into missing or unreliable data Generate synthetic features that are more amenable to data science, data analysis, or visualization goals. Who this book is for This book is designed to benefit software developers, data scientists, aspiring data scientists, and students who are interested in data analysis or scientific computing. Basic familiarity with statistics, general concepts in machine learning,...
505	0		\|a Table of ContentsData Ingestion - Tabular FormatsData Ingestion -- Hierarchical FormatsData Ingestion -- Repurposing Data SourcesThe Vicissitudes of Error -- Anomaly DetectionThe Vicissitudes of Error -- Data QualityRectification and Creation -- Value ImputationRectification and Creation -- Feature EngineeringAncillary Matters -- Closure/Glossary.
590			\|a O'Reilly \|b O'Reilly Online Learning: Academic/Public Library Edition
630	0	4	\|a Python (Programming language).
650		0	\|a Computational biology \|x Methods.
650		0	\|a Database management.
650		0	\|a Data integrity.
650		0	\|a Python (Computer program language)
650		0	\|a R (Computer program language)
650	1	2	\|a Computational Biology \|x methods
650	1	2	\|a Data Analysis
650	2	2	\|a Data Accuracy
650		6	\|a Bases de données \|x Gestion.
650		6	\|a Intégrité des données.
650		6	\|a Python (Langage de programmation)
650		6	\|a R (Langage de programmation)
650		6	\|a Qualité des données.
650		7	\|a Database design & theory. \|2 bicssc
650		7	\|a Data capture & analysis. \|2 bicssc
650		7	\|a Mathematical theory of computation. \|2 bicssc
650		7	\|a Machine learning. \|2 bicssc
650		7	\|a Information architecture. \|2 bicssc
650		7	\|a Computers \|x Data Processing. \|2 bisacsh
650		7	\|a Computers \|x Machine Theory. \|2 bisacsh
650		7	\|a Computers \|x Data Modeling & Design. \|2 bisacsh
650		7	\|a Computational biology. \|2 fast \|0 (OCoLC)fst00871990
650		7	\|a Data integrity. \|2 fast \|0 (OCoLC)fst01746571
650		7	\|a Database management. \|2 fast \|0 (OCoLC)fst00888037
650		7	\|a Python (Computer program language) \|2 fast \|0 (OCoLC)fst01084736
650		7	\|a R (Computer program language) \|2 fast \|0 (OCoLC)fst01086207
655		4	\|a Fulltext.
655		4	\|a Internet Resources.
655		7	\|a Methods (Music) \|2 fast \|0 (OCoLC)fst01423850
776	0	8	\|i Print version: \|z 1801071292 \|z 9781801071291 \|w (OCoLC)1242107775
856	4	0	\|u https://learning.oreilly.com/library/view/~/9781801071291/?ar \|z Texto completo (Requiere registro previo con correo institucional)
938			\|a YBP Library Services \|b YANK \|n 302030707
938			\|a EBSCOhost \|b EBSC \|n 2902696
994			\|a 92 \|b IZTAP

Cleaning data for effective data science : doing the other 80% of the work with Python, R, and command-line tools /

MARC

Ejemplares similares