MapReduce Essentials

MapReduce Essentials

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
MapReduce programming is a framework for processing parallelizable problems across huge datasets. This course will define MapReduce programming and explain the basics of programming in MapReduce and Hive.

Target Audience
This path is designed for developers, managers, database developers, and anyone with the basic knowledge of Java interested in learning the basics of programming in MapReduce.

Prerequisites
None

Expected Duration (hours)
2.0

Lesson Objectives

MapReduce Essentials

start the course

describe the job components and the steps of Hadoop MapReduce

identify how each MapReduce process is vital to the overall MapReduce algorithm through a conceptual example

configure Java to write Hadoop MapReduce jobs and identify the functionality of the classes within additional JARs

create and execute Hadoop MapReduce jobs, and perform compilation and running of MapReduce programs

describe the basic features and functions of the programmatic steps in a Hadoop MapReduce job

describe the concept of MapReduce chaining and compare the input and output steps in MapReduce jobs

identify the precompile, compile, and run commands, and specify different techniques to package and run MapReduce jobs

describe the storage and reading of MapReduce stores and Big Data, and handling of MapReduce and Hadoop data with HDFS over a distributed processing system

compare the persistence in the HDFS with other file storage systems, describe the specifics of reading and writing data in the HDFS, and the redundancy of HDFS across the cluster

describe the basics of Apache Hive and HiveQL

classify the usage of the four file formats supported in Hive – TEXTFILE, SEQUENCEFILE, ORC, and RCFILE

describe how to write Hive jobs by using the custom Hive data types – arrays and maps

describe how Pig is used to obtain data by using it as Pig Latin, like SQL

write Pig scripts, and describe the Pig, Local, MapReduce, and Batch modes

list the Pig commands such as LOAD, LIMIT, DUMP, and STORE for data read/write operators in Pig Latin

compare and contrast the internals and performance, and analyze the strengths and weaknesses of MapReduce, Hive, and Pig

describe the jobs run in MapReduce, and the unit testing process, tools, and techniques

recognize MapReduce job status, review, and understand the log files of different distributions of Hadoop

identify the scenarios where a MapReduce job would need to be terminated, and apply the "-list" and "-kill" commands

define JUnit and JUnit configuration scripts, and identify testing techniques and test cases using JUnit

describe Cloudera MRUnit, unit testing process, and unit testing files, and compare unit testing with MRUnit and without MRUnit

apply the use of a dummy cluster for unit and integration testing, and the basics of a mini HDFS and a mini MapReduce cluster

define the basics of the Hadoop LocalJobRunner

describe the basics of programming in MapReduce, Hive, and Pig

Course Number:
df_ahmr_a02_it_enus

Expertise Level
Beginner