Getting Started with Hive: Introduction


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

This 9-video Skillsoft Aspire course focuses solely on theory and involves no programming or query execution. Learners begin by examining what a data warehouse is, and how it differs from a relational database, important because Apache Hive is primarily a data warehouse, despite giving a SQL-like interface to query data. Hive facilitates work on very large data sets, stored as files in the Hadoop Distributed File System, and lets users perform operations in parallel on data in these files by effectively transforming Hive queries into MapReduce operations. Next, you will hear about types of data and operations which data warehouses and relational databases handle, before moving on to basic components of the Hadoop architecture.  Finally, the course discusses features of Hive making it popular among data analysts. The concluding exercise recalls differences between online transaction processing and online analytical processing systems, asking learners to identify Hadoop’s three major components; list Hadoop offerings on three major cloud platforms (AWS, Microsoft Azure, and Google Cloud Platform); and list benefits of Hive for data analysts.



Expected Duration (hours)
0.9

Lesson Objectives

Getting Started with Hive: Introduction

  • Course Overview
  • define what a data warehouse is and identify its characteristics
  • describe the functions served by relational databases and the features they offer
  • distinguish between Online Transaction Processing and Online Analytical Processing and identify the specific problems they are meant to solve
  • identify where Hive fits in the Hadoop ecosystem and how it simplifies working with Hadoop
  • describe the architecture of Hive and the functions served by HiveServer and the Metastore
  • identify the services and features offered by AWS, Azure, and GCP to run Hadoop and Hive on their infrastructure
  • describe the different primitive and complex data types available in Hive
  • compare managed and external tables in Hive and how they relate to the underlying data
  • contrast OLTP and OLAP systems, identify major components of Hadoop, explore Hive benefits for data analysis
  • Course Number:
    it_dsgshvdj_01_enus

    Expertise Level
    Beginner