Hadoop Blueprints.
About This BookSolve real-world business problems using Hadoop and other Big Data technologiesBuild efficient data lakes in Hadoop, and develop systems for various business cases like improving marketing campaigns, fraud detection, and morePower packed with six case studies to get you going with Had...
Cote: | Libro Electrónico |
---|---|
Auteur principal: | |
Format: | Électronique eBook |
Langue: | Inglés |
Publié: |
Packt Publishing,
2016.
|
Édition: | 1. |
Sujets: | |
Accès en ligne: | Texto completo |
Table des matières:
- Cover; Copyright; Credits; About the Authors; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Hadoop and Big Data; The beginning of the big data problem; Limitations of RDBMS systems; Scaling out a database on Google; Parallel processing of large datasets; Building open source Hadoop; Enterprise Hadoop; Social media and mobile channels; Data storage cost reduction; Enterprise software vendors; Pure Play Hadoop vendors; Cloud Hadoop vendors; The design of the Hadoop system; The Hadoop Distributed File System (HDFS); Data organization in HDFS.
- HDFS file management commandsNameNode and DataNodes; Metadata store in NameNode; Preventing a single point of failure with Hadoop HA; Checkpointing process; Data Store on a DataNode; Handshakes and heartbeats; MapReduce; The execution model of MapReduce Version 1; Apache YARN; Building a MapReduce Version 2 program; Problem statement; Solution workflow; Getting the dataset; Studying the dataset; Cleaning the dataset; Loading the dataset on the HDFS; Starting with a MapReduce program; Installing Eclipse; Creating a project in Eclipse; Coding and building a MapReduce program.
- Run the MapReduce program locallyExamine the result; Run the MapReduce program on Hadoop; Further processing of results; Hadoop platform tools; Data ingestion tools; Data access tools; Monitoring tools; Data governance tools; Big data use cases; Creating a 360 degree view of a customer; Fraud detection systems for banks; Marketing campaign planning; Churn detection in telecom; Analyzing sensor data; Building a data lake; The architecture of Hadoop-based systems; Lambda architecture; Summary; Chapter 2: A 360-Degree View of the Customer; Capturing business information.
- Collecting data from data sourcesCreating a data processing approach; Presenting the results; Setting up the technology stack; Tools used; Installing Hortonworks Sandbox; Creating user accounts; Exploring HUE; Exploring MYSQL and the HIVE command line; Exploring Sqoop at the command line; Test driving Hive and Sqoop; Querying data using Hive; Importing data in Hive using Sqoop; Engineering the solution; Datasets; Loading customer master data into Hadoop; Loading web logs into Hadoop; Loading tweets into Hadoop; Creating the 360-degree view; Exporting data from Hadoop; Presenting the view.
- Building a web applicationInstalling Node.js; Coding the web application in Node.js; Summary; Chapter 3: Building a Fraud Detection System; Understanding the business problem; Selecting and cleansing the dataset; Finding relevant fields; Machine learning for fraud detection; Clustering as an unsupervised machine learning method; Designing the high-level architecture; Introducing Apache Spark; Apache Spark architecture; Resilient Distributed Datasets; Transformation functions; Actions; Test driving Apache Spark; Calculating the yearly average stock prices using Spark; Apache Spark 2.X.