Getting Started with Hadoop: Advanced Operations Using MapReduce

Getting Started with Hadoop: Advanced Operations Using MapReduce

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

In this Skillsoft Aspire course, explore how MapReduce can be used to extract the five most expensive vehicles in a data set, then build an inverted index for the words appearing in a set of text files. Begin by defining a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue, then configure a Mapper to use a PriorityQueue to store the five most expensive automobiles it has processed from the dataset. Learn how to use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top five automobiles overall to the output, then execute the application to verify the results. Next, explore how you can utilize the MapReduce framework in order to generate an inverted index and configure the Reducer and Driver for the inverted index application. This leads on to running the application and examining the inverted index on HDFS (Hadoop Distributed File System). The concluding exercise involves advanced operations using MapReduce.

Expected Duration (hours)
0.8

Lesson Objectives

Getting Started with Hadoop: Advanced Operations Using MapReduce

discover the key concepts covered in this course

define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue

configure a Mapper to use a PriorityQueue to store the five most expensive vehicles it has processed from the dataset

use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top 5 vehicles overall to the output

execute the application and examine the output on HDFS to confirm that the five most expensive automobiles have been written out

define the Mapper for a MapReduce application to build an inverted index from a set of text files

configure the Reducer and the Driver for the inverted index application

run the application and examine the inverted index on HDFS

recognize the data structures and configurations involved when extracting the top N values from a data set

Course Number:
it_dshpfddj_05_enus

Expertise Level
Intermediate