Getting Started with Hadoop: Advanced Operations Using MapReduce


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

In this Skillsoft Aspire course, explore how MapReduce can be used to extract the five most expensive vehicles in a data set, then build an inverted index for the words appearing in a set of text files. Begin by defining a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue, then configure a Mapper to use a PriorityQueue to store the five most expensive automobiles it has processed from the dataset. Learn how to use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top five automobiles overall to the output, then execute the application to verify the results. Next, explore how you can utilize the MapReduce framework in order to generate an inverted index and configure the Reducer and Driver for the inverted index application. This leads on to running the application and examining the inverted index on HDFS (Hadoop Distributed File System). The concluding exercise involves advanced operations using MapReduce.



Expected Duration (hours)
0.8

Lesson Objectives

Getting Started with Hadoop: Advanced Operations Using MapReduce

  • Course Overview
  • define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
  • configure a Mapper to use a PriorityQueue to store the five most expensive vehicles it has processed from the dataset
  • use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top 5 vehicles overall to the output
  • execute the application and examine the output on HDFS to confirm that the five most expensive automobiles have been written out
  • define the Mapper for a MapReduce application to build an inverted index from a set of text files
  • configure the Reducer and the Driver for the inverted index application
  • run the application and examine the inverted index on HDFS
  • recognize the data structures and configurations involved when extracting the top N values from a data set
  • Course Number:
    it_dshpfddj_05_enus

    Expertise Level
    Intermediate