Data Factory with Hive


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Apache Hadoop is a set of algorithms for distributed storage and distributed processing of Big Data on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are commonplace and thus should be automatically handled in software by the framework. In this course, you'll explore Hive as a SQL like tool for interfacing with Hadoop. The course demonstrates the installation and configuration of Hive, followed by demonstration of Hive in action. Finally, you'll learn about extracting and loading data between Hive and a RDBMS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
2.1

Lesson Objectives

Data Factory with Hive

  • start the course
  • recall the key attributes of Hive
  • describe the configuration files
  • install and configure Hive
  • create a table in Derby using Hive
  • create a table in MySQL using Hive
  • recall the unique delimiter that Hive uses
  • describe the different operators in Hive
  • use basic SQL commands in Hive
  • use SELECT statements in Hive
  • use more complex HiveQL
  • write and use Hive scripts
  • recall what types of joins Hive can support
  • use Hive to perform joins
  • recall that a Hive partition schema must be created before loading the data
  • write a Hive partition script
  • recall how buckets are used to improve performance
  • create Hive buckets
  • recall some best practices for user defined functions
  • create a user defined function for Hive
  • recall the standard error code ranges and what they mean
  • use a Hive explain plan
  • understand configuration option, data loading and querying
  • Course Number:
    df_ahec_a07_it_enus

    Expertise Level
    Intermediate