Cargando…

Processing big data with Azure HDInsight : building real-world big data systems on Azure HDInsight using the Hadoop ecosystem /

Get a jump start on using Azure HDInsight and Hadoop Ecosystem components. As most Hadoop and Big Data projects are written in either Java, Scala, or Python, this book minimizes the effort to learn another language and is written from the perspective of a .NET developer. Hadoop components are covere...

Descripción completa

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Yadav, Vinit
Formato: Electrónico eBook
Idioma:Inglés
Publicado: [New York] : Apress, [2017]
©2017
Temas:
Acceso en línea:Texto completo (Requiere registro previo con correo institucional)
Tabla de Contenidos:
  • At a Glance; Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Chapter 1: Big Data, Hadoop, and HDInsight; What Is Big Data?; The Scale-Up and Scale-Out Approaches; Apache Hadoop; A Brief History of Hadoop; HDFS; MapReduce; YARN; Hadoop Cluster Components; HDInsight; The Advantages of HDInsight; Summary; Chapter 2: Provisioning an HDInsight Cluster; An Azure Subscription; Creating the First Cluster; Basic Configuration Options; Creating a Cluster Using the Azure Portal; Connecting to a Cluster Using RDP; Connecting to a Cluster Using SSH.
  • Creating a Cluster Using PowerShellCreating a Cluster Using an Azure Command-Line Interface; Creating a Cluster Using .NET SDK; The Resource Manager Template; HDInsight in a Sandbox Environment; Hadoop on a Virtual Machine; Hadoop on Windows; Preparing the Host Machine; Installing and Configuring Java JDK; Installing and configuring Python 2.7.x; Download and Install HDP for Windows; Summary; Chapter 3: Working with Data in HDInsight; Azure Blob Storage; The Benefits of Blob Storage; Uploading Data; Using Azure Command-Line Interface; Using Windows PowerShell.
  • Using Microsoft Azure Storage ExplorerRunning MapReduce Jobs; Using PowerShell; Using .NET SDK; Hadoop Streaming; Streaming Mapper and Reducer; Serialization with Avro Library; Data Serialization; Binary Encoding; JSON Encoding; Using Microsoft Avro Library; Summary; Chapter 4: Querying Data with Hive; Hive Essentials; Hive Architecture; Submitting a Hive Query; Using Hive View; Using Secure Shell (SSH); Using Visual Studio; Using .NET SDK; Writing HiveQL; Data Types; Create/Drop/Alter/Use Database; The Hive Table; Internal Tables; External Tables; Storage Formats; Row Formats and SerDe.
  • Partitioned TablesCreate Table Options; Temporary Tables; Data Retrieval; Hive Metastore; Apache Tez; Connecting to Hive Using ODBC and Power BI; ODBC and Power BI Configuration; Prepare Data for Analysis; Creating Hive Tables; Analyzing Data Using Power BI; Hive UDFs in C#; User Defined Function (UDF); User Defined Aggregate Functions (UDAF); User Defined Tabular Functions (UDTF); Summary; Chapter 5: Using Pig with HDInsight; Understanding Relations, Bags, Tuples, and Fields; Data Types; Connecting to Pig; Operators and Commands; Executing Pig Scripts; Summary; Chapter 6: Working with HBase.
  • OverviewWhere to Use HBase?; The Architecture of HBase; HBase HMaster; HRegion and HRegion Server; ZooKeeper; HBase Meta Table; Read and Write to an HBase Cluster; HFile; Major and Minor Compaction; Creating an HBase Cluster; Working with HBase; HBase Shell; Create Tables and Insert Data; HBase Shell Commands; Using .NET SDK to read/write Data; Writing Data; Reading/Querying Data; Summary; Chapter 7: Real-Time Analytics with Storm; Overview; Storm Topology; Stream Groupings; Storm Architecture; Nimbus; Supervisor Node; ZooKeeper; Worker, Executor, and Task; Creating a Storm Cluster.