Getting Started with Hive: Loading and Querying Data


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Among the market’s most popular data warehouses used for data science, Apache Hive simplifies working with large data sets in files by representing them as tables. In this 12-video Skillsoft Aspire course, learners explore how to create, load, and query Hive tables. For this hands-on course, learners should have a conceptual understanding of Hive and its basic components, and prior experience with querying data from tables using SQL (structured query language) and with using the command line. Key concepts covered include cluster, joining tables, and modifying tables. Demonstrations covered include using the Beeline client for Hive for simple operations; creating tables, loading them with data, and then running queries against them. Only tables with primitive data types are used here, with data loaded into these tables from HDFS (Hadoop Distributed File System) file system and local machines. Learners will work with Hive metastore and temporary tables, and how they can be used. You will become familiar with basics of using the Hive query language and quite comfortable working with HDFS.



Expected Duration (hours)
1.3

Lesson Objectives

Getting Started with Hive: Loading and Querying Data

  • Course Overview
  • use the Google Cloud Platform's Dataproc service to provision a Hadoop cluster
  • define and create a simple table in Hive using the Beeline client
  • load a few rows of data into a table and query it with simple select statements
  • run Hive queries from the shell of a host where a Hive client is installed
  • define and run a join query involving two related tables
  • describe the structure of the Hive Metastore on the Hadoop Distributed File System (HDFS)
  • create, load data into, and query an external table in Hive and contrast it with a Hive-managed table
  • use the alter table statement to change the definition of a Hive table
  • work with temporary tables that are only valid for a single Hive session and recognize how they differ from regular tables
  • populate Hive tables with data in files on both HDFS and the file system of the Hive client
  • load data into multiple tables from the contents of another table
  • use the Hadoop shell to execute Hive query scripts and work with Hive tables
  • Course Number:
    it_dsgshvdj_02_enus

    Expertise Level
    Beginner