Clean data : save time by discovering effortless strategies for cleaning, organizing, and manipulating your data /
If you are a data scientist of any level, beginners included, and interested in cleaning up your data, this is the book for you! Experience with Python or PHP is assumed, but no previous knowledge of data cleaning is needed.
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Birmingham, UK :
Packt Publishing,
2015.
|
Colección: | Community experience distilled.
|
Temas: | |
Acceso en línea: | Texto completo |
Tabla de Contenidos:
- ""Cover ""; ""Copyright""; ""Credits""; ""About the Author""; ""About the Reviewers""; ""www.PacktPub.com""; ""Table of Contents""; ""Preface""; ""Chapter 1: Why Do You Need Clean Data?""; ""A fresh perspective""; ""The data science process""; ""Communicating about data cleaning""; ""Our data cleaning environment""; ""An introductory example""; ""Summary""; ""Chapter 2: Fundamentals � Formats, Types, and Encodings""; ""File formats""; ""Text files versus binary files""; ""Opening and reading files""; ""Peeking inside files""; ""Common formats for text files""; ""The delimited format""
- Seeing invisible charactersEnclosing values to trap errant characters
- Escaping characters
- The JSON format
- The HTML format
- Archiving and compression
- Archive files
- tar
- Compressed files
- How to compress files
- How to uncompress files
- Which compression program should I use?
- Data types, nulls, and encodings
- Data types
- Numeric data
- Dates and time
- Strings
- Other data types
- Converting between data types
- Data loss
- Strategies for conversion
- Type conversion at the SQL level
- ""Type conversion at the file level""""If a null falls in a forest�""; ""Zero""; ""Empties""; ""Null""; ""Character encodings""; ""Example one � finding multibyte characters in MySQL data""; ""Example two � finding the UTF-8 and Latin-1 equivalents of Unicode characters stored in MySQL""; ""Example three � handling UTF-8 encoding at the file level""; ""Summary""; ""Chapter 3: Workhorses of Clean Data � Spreadsheets and Text Editors""; ""Spreadsheet data cleaning""; ""Text to columns in Excel""; ""Splitting strings""; ""Concatenating strings""
- ""Conditional formatting to find unusual values""""Sorting to find unusual values""; ""Importing spreadsheet data into MySQL""; ""Text editor data cleaning""; ""Text tweaking""; ""The column mode""; ""Heavy duty find and replace""; ""A word of caution""; ""Text sorting and processing duplicates""; ""Process Lines Containing""; ""An example project""; ""Step one � state the problem""; ""Step two � data collection""; ""Download the data""; ""Get familiar with the data""; ""Step three � data cleaning""; ""Extracting relevant lines""; ""Transform the lines""; ""Step four � data analysis""
- ""Summary""""Chapter 4: Speaking the Lingua Franca � Data Conversions""; ""Quick tool-based conversions""; ""Spreadsheet to CSV""; ""Spreadsheet to JSON""; ""Step one � publish Google spreadsheet to the Web""; ""Step two � create the correct URL""; ""SQL to CSV or JSON using phpMyAdmin""; ""Converting with PHP""; ""SQL to JSON using PHP""; ""SQL to CSV using PHP""; ""JSON to CSV using PHP""; ""CSV to JSON using PHP""; ""Converting with Python""; ""CSV to JSON using Python""; ""CSV to JSON using csvkit""; ""Python JSON to CSV""; ""The example project""