In this Skillsoft Aspire course, explore how MapReduce can be used to extract the five most expensive vehicles in a data set, then build an inverted index for the words appearing in a set of text files. Begin by defining a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue, then configure a Mapper to use a PriorityQueue to store the five most expensive automobiles it has processed from the dataset. Learn how to use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top five automobiles overall to the output, then execute the application to verify the results. Next, explore how you can utilize the MapReduce framework in order to generate an inverted index and configure the Reducer and Driver for the inverted index application. This leads on to running the application and examining the inverted index on HDFS (Hadoop Distributed File System). The concluding exercise involves advanced operations using MapReduce.
Getting Started with Hadoop: Advanced Operations Using MapReduce
Course Overview
define a vehicle type that can be used to represent automobiles to be stored in a Java PriorityQueue
configure a Mapper to use a PriorityQueue to store the five most expensive vehicles it has processed from the dataset
use a PriorityQueue in the Reducer of the application to receive the five most expensive automobiles from each mapper and write the top 5 vehicles overall to the output
execute the application and examine the output on HDFS to confirm that the five most expensive automobiles have been written out
define the Mapper for a MapReduce application to build an inverted index from a set of text files
configure the Reducer and the Driver for the inverted index application
run the application and examine the inverted index on HDFS
recognize the data structures and configurations involved when extracting the top N values from a data set