Ecosystem for Hadoop

Ecosystem for Hadoop

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
Hadoop's HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets. This course examines the Hadoop ecosystem by demonstrating all of the commonly used open source software components. You'll explore a Big Data model to understand how these tools combine to create a supercomputing platform. You'll also learn how the principles of supercomputing apply to Hadoop and how this yields an affordable supercomputing environment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
1.6

Lesson Objectives

Ecosystem for Hadoop

start the course

describe supercomputing

recall three major functions of data analytics

define Big Data

describe the two different types of data

describe the components of the Big Data stack

identify the data repository components

identify the data refinery components

identify the data factory components

recall the design principles of Hadoop

describe the design principles of sharing nothing

describe the design principles of embracing failure

describe the components of the Hadoop Distributed File System (HDFS)

describe the four main HDFS daemons

describe Hadoop YARN

describe the roles of the Resource Manager daemon

describe the YARN NodeManager and ApplicationMaster daemons

define MapReduce and describe its relations to YARN

describe data analytics

describe the reasons for the complexities of the Hadoop Ecosystem

describe the components of the Hadoop ecosystem

Course Number:
df_ahec_a01_it_enus

Expertise Level
Intermediate