Data science at the command line : obtain, scrub, explore, and model data with Unix power tools /
This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, au...
Clasificación: | Libro Electrónico |
---|---|
Autor principal: | |
Formato: | Electrónico eBook |
Idioma: | Inglés |
Publicado: |
Sebastopol, CA :
O'Reilly Media, Inc.,
2021.
|
Edición: | Second edition. |
Temas: | |
Acceso en línea: | Texto completo (Requiere registro previo con correo institucional) |
Tabla de Contenidos:
- Cover
- Copyright
- Table of Contents
- Foreword
- Preface
- What to Expect from This Book
- Changes for the Second Edition
- How to Read This Book
- Who This Book Is For
- Conventions Used in This Book
- O'Reilly Online Learning
- How to Contact Us
- Acknowledgments for the Second Edition (2021)
- Acknowledgments for the First Edition (2014)
- Chapter 1. Introduction
- Data Science Is OSEMN
- Obtaining Data
- Scrubbing Data
- Exploring Data
- Modeling Data
- Interpreting Data
- Intermezzo Chapters
- What Is the Command Line?
- Why Data Science at the Command Line?
- The Command Line Is Agile
- The Command Line Is Augmenting
- The Command Line Is Scalable
- The Command Line Is Extensible
- The Command Line Is Ubiquitous
- Summary
- For Further Exploration
- Chapter 2. Getting Started
- Getting the Data
- Installing the Docker Image
- Essential Unix Concepts
- The Environment
- Executing a Command-Line Tool
- Five Types of Command-Line Tools
- Combining Command-Line Tools
- Redirecting Input and Output
- Working with Files and Directories
- Managing Output
- Help!
- Summary
- For Further Exploration
- Chapter 3. Obtaining Data
- Overview
- Copying Local Files to the Docker Container
- Downloading from the Internet
- Introducing curl
- Saving
- Other Protocols
- Following Redirects
- Decompressing Files
- Converting Microsoft Excel Spreadsheets to CSV
- Querying Relational Databases
- Calling Web APIs
- Authentication
- Streaming APIs
- Summary
- For Further Exploration
- Chapter 4. Creating Command-Line Tools
- Overview
- Converting One-Liners into Shell Scripts
- Step 1: Create a File
- Step 2: Give Permission to Execute
- Step 3: Define a Shebang
- Step 4: Remove the Fixed Input
- Step 5: Add Arguments
- Step 6: Extend Your PATH
- Creating Command-Line Tools with Python and R
- Porting the Shell Script
- Processing Streaming Data from Standard Input
- Summary
- For Further Exploration
- Chapter 5. Scrubbing Data
- Overview
- Transformations, Transformations Everywhere
- Plain Text
- Filtering Lines
- Extracting Values
- Replacing and Deleting Values
- CSV
- Bodies and Headers and Columns, Oh My!
- Performing SQL Queries on CSV
- Extracting and Reordering Columns
- Filtering Rows
- Merging Columns
- Combining Multiple CSV Files
- Working with XML/HTML and JSON
- Summary
- For Further Exploration
- Chapter 6. Project Management with Make
- Overview
- Introducing Make
- Running Tasks
- Building, for Real
- Adding Dependencies
- Summary
- For Further Exploration
- Chapter 7. Exploring Data
- Overview
- Inspecting Data and Its Properties
- Header or Not, Here I Come
- Inspect All the Data
- Feature Names and Data Types
- Unique Identifiers, Continuous Variables, and Factors
- Computing Descriptive Statistics
- Column Statistics
- R One-Liners on the Shell
- Creating Visualizations
- Displaying Images from the Command Line
- Plotting in a Rush