Ecosystem for Hadoop


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Hadoop's HDFS is a highly fault-tolerant distributed file system and, like Hadoop in general, designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets. This course examines the Hadoop ecosystem by demonstrating all of the commonly used open source software components. You'll explore a Big Data model to understand how these tools combine to create a supercomputing platform. You'll also learn how the principles of supercomputing apply to Hadoop and how this yields an affordable supercomputing environment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
1.6

Lesson Objectives

Ecosystem for Hadoop

  • start the course
  • describe supercomputing
  • recall three major functions of data analytics
  • define Big Data
  • describe the two different types of data
  • describe the components of the Big Data stack
  • identify the data repository components
  • identify the data refinery components
  • identify the data factory components
  • recall the design principles of Hadoop
  • describe the design principles of sharing nothing
  • describe the design principles of embracing failure
  • describe the components of the Hadoop Distributed File System (HDFS)
  • describe the four main HDFS daemons
  • describe Hadoop YARN
  • describe the roles of the Resource Manager daemon
  • describe the YARN NodeManager and ApplicationMaster daemons
  • define MapReduce and describe its relations to YARN
  • describe data analytics
  • describe the reasons for the complexities of the Hadoop Ecosystem
  • describe the components of the Hadoop ecosystem
  • Course Number:
    df_ahec_a01_it_enus

    Expertise Level
    Intermediate