Introduction to Hadoop

Introduction to Hadoop

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
Hadoop is an open-source, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. This course will introduce Hadoop, and its key tools and their applications.

Target Audience
Individuals who are new to big data, Hadoop, and data modeling, and wish to understand key concepts and features of Hadoop and its tools

Prerequisites
None

Expected Duration (hours)
1.5

Lesson Objectives

Introduction to Hadoop

start the course

recognize what Big Data is, sources and types of data, evolution and characteristics of Big Data, and use cases of Big Data

identify Big Data infrastructure issues, and explain benefits of Hadoop

recognize basics of Hadoop, history, milestones, and core components

set up a virtual machine

install Linux on a virtual machine

recognize basic and most useful UNIX commands

identify Hadoop components

define HDFS components

recognize how to read and write in HDFS

use HDFS

recognize basics of YARN

define basics of MapReduce

identify how MapReduce processes information

use code that runs on Hadoop

define Pig, HIVE, and HBase

define Sqoop, Flume, Mahout, and Oozie

recognize storing and modeling data in Hadoop

identify available commercial distributions for Hadoop

recognize Spark and its benefits over traditional MapReduce

filter information in Hadoop

Course Number:
df_dmhp_a01_it_enus

Expertise Level
Beginner