Cargando…

Agile data science 2.0 : building full-stack data analytics applications with Spark /

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Jurney, Russell (Autor)
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Boston, MA : O'Reilly Media, 2017.
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • Copyright; Table of Contents; Preface; Agile Data Science Mailing List; Data Syndrome, Product Analytics Consultancy; Live Training; Who This Book Is For; How This Book Is Organized; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Part I. Setup; Chapter 1. Theory; Introduction; Definition; Methodology as Tweet; Agile Data Science Manifesto; The Problem with the Waterfall; Research Versus Application Development; The Problem with Agile Software; Eventual Quality: Financing Technical Debt; The Pull of the Waterfall; The Data Science Process.
  • Setting ExpectationsData Science Team Roles; Recognizing the Opportunity and the Problem; Adapting to Change; Notes on Process; Code Review and Pair Programming; Agile Environments: Engineering Productivity; Realizing Ideas with Large-Format Printing; Chapter 2. Agile Tools; Scalability = Simplicity; Agile Data Science Data Processing; Local Environment Setup; System Requirements; Setting Up Vagrant; Downloading the Data; EC2 Environment Setup; Downloading the Data; Getting and Running the Code; Getting the Code; Running the Code; Jupyter Notebooks; Touring the Toolset.
  • Agile Stack RequirementsPython 3; Serializing Events with JSON Lines and Parquet; Collecting Data; Data Processing with Spark; Publishing Data with MongoDB; Searching Data with Elasticsearch; Distributed Streams with Apache Kafka; Processing Streams with PySpark Streaming; Machine Learning with scikit-learn and Spark MLlib; Scheduling with Apache Airflow (Incubating); Reflecting on Our Workflow; Lightweight Web Applications; Presenting Our Data; Conclusion; Chapter 3. Data; Air Travel Data; Flight On-Time Performance Data; OpenFlights Database; Weather Data.
  • Data Processing in Agile Data ScienceStructured Versus Semistructured Data; SQL Versus NoSQL; SQL; NoSQL and Dataflow Programming; Spark: SQL + NoSQL; Schemas in NoSQL; Data Serialization; Extracting and Exposing Features in Evolving Schemas; Conclusion; Part II. Climbing the Pyramid; Chapter 4. Collecting and Displaying Records; Putting It All Together; Collecting and Serializing Flight Data; Processing and Publishing Flight Records; Publishing Flight Records to MongoDB; Presenting Flight Records in a Browser; Serving Flights with Flask and pymongo; Rendering HTML5 with Jinja2.
  • Agile CheckpointListing Flights; Listing Flights with MongoDB; Paginating Data; Searching for Flights; Creating Our Index; Publishing Flights to Elasticsearch; Searching Flights on the Web; Conclusion; Chapter 5. Visualizing Data with Charts and Tables; Chart Quality: Iteration Is Essential; Scaling a Database in the Publish/Decorate Model; First Order Form; Second Order Form; Third Order Form; Choosing a Form; Exploring Seasonality; Querying and Presenting Flight Volume; Extracting Metal (Airplanes [Entities]); Extracting Tail Numbers; Assessing Our Airplanes; Data Enrichment.