Cargando…

Apache hive essentials : immerse yourself on a fantastic journey to discover the attributes of big data by using hive /

Annotation

Detalles Bibliográficos
Clasificación:Libro Electrónico
Autor principal: Du, Dayong (Autor)
Otros Autores: Siddiqui, Sameen (Editor ), Subramanian, Laxmi (Editor )
Formato: Electrónico eBook
Idioma:Inglés
Publicado: Birmingham, England ; Mumbai [India] : Packt Publishing, 2015.
Colección:Community experience distilled.
Temas:
Acceso en línea:Texto completo
Tabla de Contenidos:
  • Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Overview of Big Data and Hive; A short history; Introducing big data; Relational and NoSQL database versus Hadoop; Batch, real-time, and stream processing; Overview of the Hadoop ecosystem; Hive overview; Summary; Chapter 2: Setting Up the Hive Environment; Installing Hive from Apache; Installing Hive from vendor packages; Starting Hive in the cloud; Using the Hive command line and Beeline; The Hive-integrated development environment; Summary
  • Chapter 3: Data Definition and DescriptionUnderstanding Hive data types; Data type conversions; Hive Data Definition Language; Hive database; Hive internal and external tables; Hive partitions; Hive buckets; Hive views; Summary; Chapter 4: Data Selection and Scope; The SELECT statement; The INNER JOIN statement; The OUTER JOIN and CROSS JOIN statements; Special JOIN
  • MAPJOIN; Set operation
  • UNION ALL; Summary; Chapter 5: Data Manipulation; Data exchange
  • LOAD; Data exchange
  • INSERT; Data exchange
  • EXPORT and IMPORT; ORDER and SORT; Operators and functions; Transactions; Summary
  • Chapter 6: Data Aggregation and SamplingBasic aggregation
  • GROUP BY; Advanced aggregation
  • GROUPING SETS; Advanced aggregation
  • ROLLUP and CUBE; Aggregation condition
  • HAVING; Analytic functions; Sampling; Summary; Chapter 7: Performance Considerations; Performance utilities; The EXPLAIN statement; The ANALYZE statement; Design optimization; Partition tables; Bucket tables; Index; Data file optimization; File format; Compression; Storage optimization; Job and query optimization; Local mode; JVM reuse; Parallel execution; Join optimization; Common join; Map join; Bucket map join
  • Sort merge bucket (SMB) joinSort merge bucket map (SMBM) join; Skew join; Summary; Chapter 8: Extensibility Considerations; User-defined functions; The UDF code template; The UDAF code template; The UDTF code template; Development and deployment; Streaming; SerDe; Summary; Chapter 9: Security Considerations; Authentication; Metastore server authentication; HiveServer2 authentication; Authorization; Legacy mode; Storage-based mode; SQL standard-based mode; Encryption; Summary; Chapter 10: Working with Other Tools; JDBC/ODBC connector; HBase; Hue; HCatalog; ZooKeeper; Oozie; Hive roadmap