Designing Hadoop Clusters


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Hadoop is an Apache Software Foundation project and open source software platform for scalable, distributed computing. Hadoop can provide fast and reliable analysis of both structured data and unstructured data. In this course you will learn about the design principles, the cluster architecture, considerations for servers and operating systems, and how to plan for a deployment. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Developers interested in expanding their knowledge of Hadoop from the operations perspective

Prerequisites
None

Expected Duration (hours)
2.2

Lesson Objectives

Designing Hadoop Clusters

  • start the course
  • describe the principles of supercomputing
  • recall the roles and skills needed for the Hadoop engineering team
  • recall the advantages and shortcomings of using Hadoop as a supercomputing platform
  • describe the three axioms of supercomputing
  • describe the dumb hardware and smart software, and the share nothing design principles
  • describe the design principles for move processing not data, embrace failure, and build applications not infrastructure
  • describe the different rack architectures for Hadoop.
  • describe the best practices for scaling a Hadoop cluster.
  • recall the best practices for different types of network clusters
  • recall the primary responsibilities for the master, data, and edge servers
  • recall some of the recommendations for a master server and edge server
  • recall some of the recommendations for a data server
  • recall some of the recommendations for an operating system
  • recall some of the recommendations for hostnames and DNS entries
  • describe the recommendations for HDD
  • calculate the correct number of disks required for a storage solution
  • compare the use of commodity hardware with enterprise disks
  • plan for the development of a Hadoop cluster
  • set up flash drives as boot media
  • set up a kickstart file as boot media
  • set up a network installer
  • identify the hardware and networking recommendations for a Hadoop cluster
  • Course Number:
    df_ahop_a01_it_enus

    Expertise Level
    Beginner