Introduction to Hadoop


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.

Target Audience
Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools

Prerequisites
None

Expected Duration (hours)
1.5

Lesson Objectives

Introduction to Hadoop

  • start the course
  • recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data
  • identify Big Data infrastructure issues, and explain benefits of Hadoop
  • recognize basics of Hadoop, history, milestones, and core components
  • set up a virtual machine
  • install Linux on a virtual machine
  • recognize basic and most useful UNIX commands
  • identify Hadoop components
  • define HDFS components
  • recognize how to read and write in HDFS
  • use HDFS
  • recognize basics of YARN
  • define basics of MapReduce
  • identify how MapReduce processes information
  • use code that runs on Hadoop
  • define Pig, HIVE, and HBase
  • define Sqoop, Flume, Mahout, and Oozie
  • recognize storing and modeling data in Hadoop
  • identify available commercial distributions for Hadoop
  • recognize Spark and its benefits over traditional MapReduce
  • filter information in Hadoop
  • Course Number:
    df_dmhp_a01_it_enus

    Expertise Level
    Beginner