Cargando…

Mastering Spark for data science /

"Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products."

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Morgan, Andrew
Otros Autores: Amend, Antoine, George, David, Hallett, Matthew
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, UK : Packt Publishing Ltd., 2017.
Temas:
Acceso en línea:Texto completo

MARC

LEADER 00000cam a2200000Ii 4500
001 EBSCO_ocn981985497
003 OCoLC
005 20231017213018.0
006 m o d
007 cr |n|||||||||
008 170407s2017 enk o 001 0 eng d
040 |a IDEBK  |b eng  |e pn  |c IDEBK  |d YDX  |d MERUC  |d N$T  |d EBLCP  |d OCLCF  |d COO  |d IDEBK  |d OCLCQ  |d OCLCO  |d OCLCQ  |d OCLCO  |d LVT  |d UKAHL  |d OCLCQ  |d OCLCO  |d OCLCQ  |d OCLCO 
019 |a 981591538  |a 981844508  |a 982010852 
020 |a 1785888285  |q (electronic bk.) 
020 |a 9781785888281  |q (electronic bk.) 
020 |z 1785882147 
029 1 |a AU@  |b 000066231506 
029 1 |a CHNEW  |b 000953095 
029 1 |a CHVBK  |b 484641344 
029 1 |a AU@  |b 000067024705 
029 1 |a AU@  |b 000067103185 
035 |a (OCoLC)981985497  |z (OCoLC)981591538  |z (OCoLC)981844508  |z (OCoLC)982010852 
037 |a 1003903  |b MIL 
050 4 |a QA76.9.D343 
072 7 |a COM  |x 021030  |2 bisacsh 
082 0 4 |a 005.75/85  |2 23 
049 |a UAMI 
100 1 |a Morgan, Andrew. 
245 1 0 |a Mastering Spark for data science /  |c Andrew Morgan, Antoine Amend, Matthew Hallett, David George ; foreword by Harry Powell. 
260 |a Birmingham, UK :  |b Packt Publishing Ltd.,  |c 2017. 
300 |a 1 online resource 
336 |a text  |b txt  |2 rdacontent 
337 |a computer  |b c  |2 rdamedia 
338 |a online resource  |b cr  |2 rdacarrier 
500 |a Includes index. 
520 |a "Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products." 
588 0 |a Print version record. 
505 0 |a Cover; Copyright; Credits; Foreword; About the Authors; About the Reviewer; www.PacktPub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: The Big Data Science Ecosystem; Introducing the Big Data ecosystem; Data management; Data management responsibilities; The right tool for the job; Overall architecture; Data Ingestion; Data Lake; Reliable storage; Scalable data processing capability; Data science platform; Data Access; Data technologies; The role of Apache Spark; Companion tools; Apache HDFS; Advantages; Disadvantages; Installation; Amazon S3; Advantages; Disadvantages. 
505 8 |a InstallationApache Kafka; Advantages; Disadvantages; Installation; Apache Parquet; Advantages; Disadvantages; Installation; Apache Avro; Advantages; Disadvantages; Installation; Apache NiFi; Advantages; Disadvantages; Installation; Apache YARN; Advantages; Disadvantages; Installation; Apache Lucene; Advantages; Disadvantages; Installation; Kibana; Advantages; Disadvantages; Installation; Elasticsearch; Advantages; Disadvantages; Installation; Accumulo; Advantages; Disadvantages; Installation; Summary; Chapter 2: Data Acquisition; Data pipelines; Universal ingestion framework. 
505 8 |a Introducing the GDELT news streamDiscovering GDELT in real-time; Our first GDELT feed; Improving with publish and subscribe; Content registry; Choices and more choices; Going with the flow; Metadata model; Kibana dashboard; Quality assurance; [Example 1 -- Basic quality checking, no contending users]; Example 1 -- Basic quality checking, no contending users; Example 2 -- Advanced quality checking, no contending users; Example 3 -- Basic quality checking, 50% utility due to contending users; Summary; Chapter 3: Input Formats and Schema; A structured life is a good life; GDELT dimensional modeling. 
505 8 |a GDELT modelFirst look at the data; Core global knowledge graph model; Hidden complexity; Denormalized models; Challenges with flattened data; Issue 1 -- Loss of contextual information; Issue 2: Re-establishing dimensions; Issue 3: Including reference data; Loading your data; Schema agility; Reality check; GKG ELT; Position matters; Avro; Spark-Avro method; Pedagogical method; When to perform Avro transformation; Parquet; Summary; Chapter 4: Exploratory Data Analysis; The problem, principles and planning; Understanding the EDA problem; Design principles; General plan of exploration; Preparation. 
505 8 |a Introducing mask based data profilingIntroducing character class masks; Building a mask based profiler; Setting up Apache Zeppelin; Constructing a reusable notebook; Exploring GDELT; GDELT GKG datasets; The files; Special collections; Reference data; Exploring the GKG v2.1; The Translingual files; A configurable GCAM time series EDA; Plot.ly charting on Apache Zeppelin; Exploring translation sourced GCAM sentiment with plot.ly; Concluding remarks; A configurable GCAM Spatio-Temporal EDA; Introducing GeoGCAM; Does our spatial pivot work?; Summary; Chapter 5: Spark for Geographic Analysis. 
590 |a eBooks on EBSCOhost  |b EBSCO eBook Subscription Academic Collection - Worldwide 
630 0 0 |a Spark (Electronic resource : Apache Software Foundation) 
630 0 7 |a Spark (Electronic resource : Apache Software Foundation)  |2 fast 
650 0 |a Data mining. 
650 0 |a Machine learning. 
650 0 |a Big data. 
650 6 |a Exploration de données (Informatique) 
650 6 |a Apprentissage automatique. 
650 6 |a Données volumineuses. 
650 7 |a COMPUTERS  |x Databases  |x Data Mining.  |2 bisacsh 
650 7 |a Big data  |2 fast 
650 7 |a Data mining  |2 fast 
650 7 |a Machine learning  |2 fast 
700 1 |a Amend, Antoine. 
700 1 |a George, David. 
700 1 |a Hallett, Matthew. 
776 0 8 |i Print version:  |a Morgan, Andrew.  |t Mastering Spark for Data Science.  |d Birmingham : Packt Publishing, ©2017 
856 4 0 |u https://ebsco.uam.elogim.com/login.aspx?direct=true&scope=site&db=nlebk&AN=1495812  |z Texto completo 
938 |a Askews and Holts Library Services  |b ASKH  |n AH30656483 
938 |a EBL - Ebook Library  |b EBLB  |n EBL4833930 
938 |a EBSCOhost  |b EBSC  |n 1495812 
938 |a ProQuest MyiLibrary Digital eBook Collection  |b IDEB  |n cis34561627 
938 |a YBP Library Services  |b YANK  |n 13953597 
994 |a 92  |b IZTAP