Getting Started with Hive: Optimizing Query Executions with Partitioning

Getting Started with Hive: Optimizing Query Executions with Partitioning

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

Continue to explore the versatility of Apache Hive, among today’s most popular data warehouses, in this 10-video Skillsoft Aspire course. Learners are shown ways to optimize query executions, including the powerful technique of partitioning data sets. The hands-on course assumes previous work with Hive tables using the Hive query language and in processing complex data types, along with theoretical understanding of improving query performance by partitioning very large data sets. Demonstrations focus on basics of partitioning and how to create partitions and load data into them. Learners work with both Hive-managed tables and external tables to see how partitioning works for each; then watch navigating to the shell of the Hadoop master node, and creating new directories in the Hadoop file system. Observe dynamic partitioning of tables and how this simplifies loading of data into partitions. Finally, you explore how using multiple columns in a table can partition data within it. During this course, learners will acquire a sound understanding of how exactly large data sets can be partitioned into smaller chunks, improving query performance.

Expected Duration (hours)
1.0

Lesson Objectives

Getting Started with Hive: Optimizing Query Executions with Partitioning

discover the key concepts covered in this course

use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster. Not required if you already have a Hadoop environment set up with Hive

define a table which will contain data partitioned based on the value in one of its columns

insert data into partitions of a Hive table and explore the partition and its data on HDFS

load data into table partitions from files

create and populate partitions in an external table

alter the definition of a partition to modify its contents

define and work with dynamic partitions on your Hive tables

configure a table to use more than one column to define partitions and explore the partition on HDFS

use partitioning to boost query performance in HDFS

Course Number:
it_dsgshvdj_05_enus

Expertise Level
Intermediate