Getting Started with Hive: Introduction

Getting Started with Hive: Introduction

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

This 9-video Skillsoft Aspire course focuses solely on theory and involves no programming or query execution. Learners begin by examining what a data warehouse is, and how it differs from a relational database, important because Apache Hive is primarily a data warehouse, despite giving a SQL-like interface to query data. Hive facilitates work on very large data sets, stored as files in the Hadoop Distributed File System, and lets users perform operations in parallel on data in these files by effectively transforming Hive queries into MapReduce operations. Next, you will hear about types of data and operations which data warehouses and relational databases handle, before moving on to basic components of the Hadoop architecture. Finally, the course discusses features of Hive making it popular among data analysts. The concluding exercise recalls differences between online transaction processing and online analytical processing systems, asking learners to identify Hadoop’s three major components; list Hadoop offerings on three major cloud platforms (AWS, Microsoft Azure, and Google Cloud Platform); and list benefits of Hive for data analysts.

Expected Duration (hours)
0.9

Lesson Objectives

Getting Started with Hive: Introduction

discover the key concepts covered in this course

define what a data warehouse is and identify its characteristics

describe the functions served by relational databases and the features they offer

distinguish between Online Transaction Processing and Online Analytical Processing and identify the specific problems they are meant to solve

identify where Hive fits in the Hadoop ecosystem and how it simplifies working with Hadoop

describe the architecture of Hive and the functions served by HiveServer and the Metastore

identify the services and features offered by AWS, Azure, and GCP to run Hadoop and Hive on their infrastructure

describe the different primitive and complex data types available in Hive

compare managed and external tables in Hive and how they relate to the underlying data

contrast OLTP and OLAP systems, identify major components of Hadoop, explore Hive benefits for data analysis

Course Number:
it_dsgshvdj_01_enus

Expertise Level
Beginner