Cargando…

Spark in Action Covers Apache Spark 3 with Examples in Java, Python, and Scala.

Spark in Action, Second Edition , teaches you to create end-to-end analytics applications. In this entirely new book, you'll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you'll discover Java, Python, and Scala code s...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Perrin, Jean Georges
Formato: Electrónico eBook
Idioma:Inglés
Publicado: New York : Manning Publications Co. LLC, 2020.
Colección:ITpro collection
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)

MARC

LEADER 00000cam a2200000Mu 4500
001 OR_on1257078258
003 OCoLC
005 20231017213018.0
006 m o d
007 cr |||||||||||
008 210619s2020 nyu o ||| 0 eng d
040 |a EBLCP  |b eng  |c EBLCP  |d YDX  |d TOH  |d BRF  |d K6U  |d N$T  |d OCLCF  |d OCLCO  |d CZL  |d OCLCO  |d OCLCQ  |d OCLCO 
019 |a 1256806298  |a 1272905424  |a 1273711114  |a 1277199230  |a 1281183707 
020 |a 9781638351306 
020 |a 1638351309 
020 |z 1617295523 
020 |z 9781617295522 
024 8 |a 9781617295522 
029 1 |a AU@  |b 000069849303 
029 1 |a AU@  |b 000071519844 
035 |a (OCoLC)1257078258  |z (OCoLC)1256806298  |z (OCoLC)1272905424  |z (OCoLC)1273711114  |z (OCoLC)1277199230  |z (OCoLC)1281183707 
037 |a itpro 
050 1 4 |a QA76.73.S59  |b .P47 2020eb 
082 0 4 |a 006.3/12  |2 23 
049 |a UAMI 
100 1 |a Perrin, Jean Georges. 
245 1 0 |a Spark in Action  |h [electronic resource] :  |b Covers Apache Spark 3 with Examples in Java, Python, and Scala. 
260 |a New York :  |b Manning Publications Co. LLC,  |c 2020. 
300 |a 1 online resource (498 p.) 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
347 |a text file 
490 0 |a ITpro collection 
500 |a Description based upon print version of record. 
504 |a Includes bibliographical references. 
520 |a Spark in Action, Second Edition , teaches you to create end-to-end analytics applications. In this entirely new book, you'll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you'll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. 
542 |f © 2020 Manning Publications Co. All rights reserved.  |g 2020 
505 0 |a Intro -- Copyright -- brief contents -- contents -- front matter -- foreword -- The analytics operating system -- preface -- acknowledgments -- about this book -- Who should read this book -- What will you learn in this book? -- How this book is organized -- About the code -- liveBook discussion forum -- about the author -- about the cover illustration -- Part 1. The theory crippled by awesome examples -- 1. So, what is Spark, anyway? -- 1.1 The big picture: What Spark is and what it does -- 1.1.1 What is Spark? -- 1.1.2 The four pillars of mana -- 1.2 How can you use Spark? -- 1.2.1 Spark in a data processing/engineering scenario -- 1.2.2 Spark in a data science scenario -- 1.3 What can you do with Spark? -- 1.3.1 Spark predicts restaurant quality at NC eateries -- 1.3.2 Spark allows fast data transfer for Lumeris -- 1.3.3 Spark analyzes equipment logs for CERN -- 1.3.4 Other use cases -- 1.4 Why you will love the dataframe -- 1.4.1 The dataframe from a Java perspective -- 1.4.2 The dataframe from an RDBMS perspective -- 1.4.3 A graphical representation of the dataframe -- 1.5 Your first example -- 1.5.1 Recommended software -- 1.5.2 Downloading the code -- 1.5.3 Running your first application -- Command line -- Eclipse -- 1.5.4 Your first code -- Summary -- 2. Architecture and flow -- 2.1 Building your mental model -- 2.2 Using Java code to build your mental model -- 2.3 Walking through your application -- 2.3.1 Connecting to a master -- 2.3.2 Loading, or ingesting, the CSV file -- 2.3.3 Transforming your data -- 2.3.4 Saving the work done in your dataframe to a database -- Summary -- 3. The majestic role of the dataframe -- 3.1 The essential role of the dataframe in Spark -- 3.1.1 Organization of a dataframe -- 3.1.2 Immutability is not a swear word -- 3.2 Using dataframes through examples -- 3.2.1 A dataframe after a simple CSV ingestion. 
505 8 |a 6.2.2 Setting up the environment -- 6.3 Building your application to run on the cluster -- 6.3.1 Building your application's uber JAR -- 6.3.2 Building your application by using Git and Maven -- 6.4 Running your application on the cluster -- 6.4.1 Submitting the uber JAR -- 6.4.2 Running the application -- 6.4.3 the Spark user interface -- Summary -- Part 2. Ingestion -- 7. Ingestion from files -- 7.1 Common behaviors of parsers -- 7.2 Complex ingestion from CSV -- 7.2.1 Desired output -- 7.2.2 Code -- 7.3 Ingesting a CSV with a known schema -- 7.3.1 Desired output -- 7.3.2 Code -- 7.4 Ingesting a JSON file -- 7.4.1 Desired output -- 7.4.2 Code -- 7.5 Ingesting a multiline JSON file -- 7.5.1 Desired output -- 7.5.2 Code -- 7.6 Ingesting an XML file -- 7.6.1 Desired output -- 7.6.2 Code -- 7.7 Ingesting a text file -- 7.7.1 Desired output -- 7.7.2 Code -- 7.8 File formats for big data -- 7.8.1 The problem with traditional file formats -- 7.8.2 Avro is a schema-based serialization format -- 7.8.3 ORC is a columnar storage format -- 7.8.4 Parquet is also a columnar storage format -- 7.8.5 Comparing Avro, ORC, and Parquet -- 7.9 Ingesting Avro, ORC, and Parquet files -- 7.9.1 Ingesting Avro -- 7.9.2 Ingesting ORC -- 7.9.3 Ingesting Parquet -- 7.9.4 Reference table for ingesting Avro, ORC, or Parquet -- Summary -- 8. Ingestion from databases -- 8.1 Ingestion from relational databases -- 8.1.1 Database connection checklist -- 8.1.2 Understanding the data used in the examples -- 8.1.3 Desired output -- 8.1.4 Code -- 8.1.5 Alternative code -- 8.2 The role of the dialect -- 8.2.1 What is a dialect, anyway? -- 8.2.2 JDBC dialects provided with Spark -- 8.2.3 Building your own dialect -- 8.3 Advanced queries and ingestion -- 8.3.1 Filtering by using a WHERE clause -- 8.3.2 Joining data in the database -- 8.3.3 Performing Ingestion and partitioning. 
505 8 |a 8.3.4 Summary of advanced features -- 8.4 Ingestion from Elasticsearch -- 8.4.1 Data flow -- 8.4.2 The New York restaurants dataset digested by Spark -- 8.4.3 Code to ingest the restaurant dataset from Elasticsearch -- Summary -- 9 Advanced ingestion: finding data sources and building your own -- 9.1 What is a data source? -- 9.2 Benefits of a direct connection to a data source -- 9.2.1 Temporary files -- 9.2.2 Data quality scripts -- 9.2.3 Data on demand -- 9.3 Finding data sources at Spark Packages -- 9.4 Building your own data source -- 9.4.1 Scope of the example project -- 9.4.2 Your data source API and options -- 9.5 Behind the scenes: Building the data source itself -- 9.6 Using the register file and the advertiser class -- 9.7 Understanding the relationship between the data and schema -- 9.7.1 The data source builds the relation -- 9.7.2 Inside the relation -- 9.8 Building the schema from a JavaBean -- 9.9 Building the dataframe is magic with the utilities -- 9.10 The other classes -- Summary -- 10. Ingestion through structured streaming -- 10.1 What's streaming? -- 10.2 Creating your first stream -- 10.2.1 Generating a file stream -- 10.2.2 Consuming the records -- 10.2.3 Getting records, not lines -- 10.3 Ingesting data from network streams -- 10.4 Dealing with multiple streams -- 10.5 Differentiating discretized and structured streaming -- Summary -- Part 3. Transforming your data -- 11. Working with SQL -- 11.1 Working with Spark SQL -- 11.2 The difference between local and global views -- 11.3 Mixing the dataframe API and Spark SQL -- 11.4 Don't DELETE it! -- 11.5 Going further with SQL -- Summary -- 12 Transforming your data -- 12.1 What is data transformation? -- 12.2 Process and example of record-level transformation -- 12.2.1 Data discovery to understand the complexity -- 12.2.2 Data mapping to draw the process. 
505 8 |a 12.2.3 Writing the transformation code -- 12.2.4 Reviewing your data transformation to ensure a quality process -- What about sorting? -- Wrapping up your first Spark transformation -- 12.3 Joining datasets -- 12.3.1 A closer look at the datasets to join -- 12.3.2 Building the list of higher education institutions per county -- Initialization of Spark -- Loading and preparing the data -- 12.3.3 Performing the joins -- Joining the FIPS county identifier with the higher ed dataset using a join -- Joining the census data to get the county name -- 12.4 Performing more transformations -- Summary -- 13. Transforming entire documents -- 13.1 Transforming entire documents and their structure -- 13.1.1 Flattening your JSON document -- 13.1.2 Building nested documents for transfer and storage -- 13.2 The magic behind static functions -- 13.3 Performing more transformations -- Summary -- 14. Extending transformations with user-defined functions -- 14.1 Extending Apache Spark -- 14.2 Registering and calling a UDF -- 14.2.1 Registering the UDF with Spark -- 14.2.2 Using the UDF with the dataframe API -- 14.2.3 Manipulating UDFs with SQL -- 14.2.4 Implementing the UDF -- 14.2.5 Writing the service itself -- 14.3 Using UDFs to ensure a high level of data quality -- 14.4 Considering UDFs' constraints -- Summary -- 15. Aggregating your data -- 15.1 Aggregating data with Spark -- 15.1.1 A quick reminder on aggregations -- 15.1.2 Performing basic aggregations with Spark -- Performing an aggregation using the dataframe API -- Performing an aggregation using Spark SQL -- 15.2 Performing aggregations with live data -- 15.2.1 Preparing your dataset -- 15.2.2 Aggregating data to better understand the schools -- What is the average enrollment for each school? -- What is the evolution of the number of students? -- What is the higher enrollment per school and year?. 
590 |a O'Reilly  |b O'Reilly Online Learning: Academic/Public Library Edition 
650 0 |a Big data. 
650 0 |a Data mining  |x Computer programs. 
650 6 |a Données volumineuses. 
650 6 |a Exploration de données (Informatique)  |x Logiciels. 
650 7 |a Big data  |2 fast 
776 0 8 |i Print version:  |a Perrin, Jean Georges  |t Spark in Action  |d New York : Manning Publications Co. LLC,c2020  |z 9781617295522 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781617295522/?ar  |z Texto completo (Requiere registro previo con correo institucional) 
938 |a ProQuest Ebook Central  |b EBLB  |n EBL6642610 
938 |a YBP Library Services  |b YANK  |n 302272683 
938 |a EBSCOhost  |b EBSC  |n 2948770 
994 |a 92  |b IZTAP