Cargando…

Data science at the command line : obtain, scrub, explore, and model data with Unix power tools /

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, au...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Janssens, Jeroen, (Data scientist) (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Sebastopol, CA : O'Reilly Media, Inc., 2021.
Edición:Second edition.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Cover
  • Copyright
  • Table of Contents
  • Foreword
  • Preface
  • What to Expect from This Book
  • Changes for the Second Edition
  • How to Read This Book
  • Who This Book Is For
  • Conventions Used in This Book
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments for the Second Edition (2021)
  • Acknowledgments for the First Edition (2014)
  • Chapter 1. Introduction
  • Data Science Is OSEMN
  • Obtaining Data
  • Scrubbing Data
  • Exploring Data
  • Modeling Data
  • Interpreting Data
  • Intermezzo Chapters
  • What Is the Command Line?
  • Why Data Science at the Command Line?
  • The Command Line Is Agile
  • The Command Line Is Augmenting
  • The Command Line Is Scalable
  • The Command Line Is Extensible
  • The Command Line Is Ubiquitous
  • Summary
  • For Further Exploration
  • Chapter 2. Getting Started
  • Getting the Data
  • Installing the Docker Image
  • Essential Unix Concepts
  • The Environment
  • Executing a Command-Line Tool
  • Five Types of Command-Line Tools
  • Combining Command-Line Tools
  • Redirecting Input and Output
  • Working with Files and Directories
  • Managing Output
  • Help!
  • Summary
  • For Further Exploration
  • Chapter 3. Obtaining Data
  • Overview
  • Copying Local Files to the Docker Container
  • Downloading from the Internet
  • Introducing curl
  • Saving
  • Other Protocols
  • Following Redirects
  • Decompressing Files
  • Converting Microsoft Excel Spreadsheets to CSV
  • Querying Relational Databases
  • Calling Web APIs
  • Authentication
  • Streaming APIs
  • Summary
  • For Further Exploration
  • Chapter 4. Creating Command-Line Tools
  • Overview
  • Converting One-Liners into Shell Scripts
  • Step 1: Create a File
  • Step 2: Give Permission to Execute
  • Step 3: Define a Shebang
  • Step 4: Remove the Fixed Input
  • Step 5: Add Arguments
  • Step 6: Extend Your PATH
  • Creating Command-Line Tools with Python and R
  • Porting the Shell Script
  • Processing Streaming Data from Standard Input
  • Summary
  • For Further Exploration
  • Chapter 5. Scrubbing Data
  • Overview
  • Transformations, Transformations Everywhere
  • Plain Text
  • Filtering Lines
  • Extracting Values
  • Replacing and Deleting Values
  • CSV
  • Bodies and Headers and Columns, Oh My!
  • Performing SQL Queries on CSV
  • Extracting and Reordering Columns
  • Filtering Rows
  • Merging Columns
  • Combining Multiple CSV Files
  • Working with XML/HTML and JSON
  • Summary
  • For Further Exploration
  • Chapter 6. Project Management with Make
  • Overview
  • Introducing Make
  • Running Tasks
  • Building, for Real
  • Adding Dependencies
  • Summary
  • For Further Exploration
  • Chapter 7. Exploring Data
  • Overview
  • Inspecting Data and Its Properties
  • Header or Not, Here I Come
  • Inspect All the Data
  • Feature Names and Data Types
  • Unique Identifiers, Continuous Variables, and Factors
  • Computing Descriptive Statistics
  • Column Statistics
  • R One-Liners on the Shell
  • Creating Visualizations
  • Displaying Images from the Command Line
  • Plotting in a Rush