Data Factory with Hive

Data Factory with Hive

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
Apache Hadoop is a set of algorithms for distributed storage and distributed processing of Big Data on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are commonplace and thus should be automatically handled in software by the framework. In this course, you'll explore Hive as a SQL like tool for interfacing with Hadoop. The course demonstrates the installation and configuration of Hive, followed by demonstration of Hive in action. Finally, you'll learn about extracting and loading data between Hive and a RDBMS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
2.1

Lesson Objectives

Data Factory with Hive

start the course

recall the key attributes of Hive

describe the configuration files

install and configure Hive

create a table in Derby using Hive

create a table in MySQL using Hive

recall the unique delimiter that Hive uses

describe the different operators in Hive

use basic SQL commands in Hive

use SELECT statements in Hive

use more complex HiveQL

write and use Hive scripts

recall what types of joins Hive can support

use Hive to perform joins

recall that a Hive partition schema must be created before loading the data

write a Hive partition script

recall how buckets are used to improve performance

create Hive buckets

recall some best practices for user defined functions

create a user defined function for Hive

recall the standard error code ranges and what they mean

use a Hive explain plan

understand configuration option, data loading and querying

Course Number:
df_ahec_a07_it_enus

Expertise Level
Intermediate