Getting Started with Hive: Optimizing Query Executions with Partitioning


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Continue to explore the versatility of Apache Hive, among today’s most popular data warehouses, in this 10-video Skillsoft Aspire course. Learners are shown ways to optimize query executions, including the powerful technique of partitioning data sets. The hands-on course assumes previous work with Hive tables using the Hive query language and in processing complex data types, along with theoretical understanding of improving query performance by partitioning very large data sets. Demonstrations focus on basics of partitioning and how to create partitions and load data into them. Learners work with both Hive-managed tables and external tables to see how partitioning works for each; then watch navigating to the shell of the Hadoop master node, and creating new directories in the Hadoop file system. Observe dynamic partitioning of tables and how this simplifies loading of data into partitions. Finally, you explore how using multiple columns in a table can partition data within it. During this course, learners will acquire a sound understanding of how exactly large data sets can be partitioned into smaller chunks, improving query performance.



Expected Duration (hours)
1.0

Lesson Objectives

Getting Started with Hive: Optimizing Query Executions with Partitioning

  • Course Overview
  • use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive
  • define a table which will contain data partitioned based on the value in one of its columns
  • insert data into partitions of a Hive table and explore the partition and its data on HDFS
  • load data into table partitions from files
  • create and populate partitions in an external table
  • alter the definition of a partition to modify its contents
  • define and work with dynamic partitions on your Hive tables
  • configure a table to use more than one column to define partitions and explore the partition on HDFS
  • use partitioning to boost query performance in HDFS
  • Course Number:
    it_dsgshvdj_05_enus

    Expertise Level
    Intermediate