Performance Tuning of Hadoop Clusters


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
The Apache Hadoop software library is a framework that allows for the distributed processing of large datasets across clusters of computers using a simple programming model. Hadoop can scale up from single servers to thousands of machines, each offering local computation and storage. This course will focus on performance tuning of the Hadoop cluster. We will examine best practices and recommendations for performance tuning of the operating system, memory, HDFS, YARN and MapReduce. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Administrators looking to expand their skill sets to include performance tuning Hadoop clusters

Prerequisites
None

Expected Duration (hours)
2.7

Lesson Objectives

Performance Tuning of Hadoop Clusters

  • start the course
  • recall the three main functions of service capacity
  • describe different strategies of performance tuning
  • list some of the best practices for network tuning
  • install compression
  • describe the configuration files and parameters used in performance tuning of the operating system
  • describe the purpose of Java tuning
  • recall some of the rules for tuning the datanode
  • describe the configuration files and parameters used in performance tuning of memory for daemons
  • describe the purpose of memory tuning for YARN
  • recall why the Node Manager kills containers
  • performance tune memory for the Hadoop cluster
  • describe the configuration files and parameters used in performance tuning of HDFS
  • describe the sizing and balancing of the HDFS data blocks
  • describe the use of TestDFSIO
  • performance tune HDFS
  • describe the configuration files and parameters used in performance tuning of YARN
  • configure Speculative execution
  • describe the configuration files and parameters used in performance tuning of MapReduce
  • tune up MapReduce for performance reasons
  • describe the practice of benchmarking on a Hadoop cluster
  • describe the different tools used for benchmarking a cluster
  • perform a benchmark of a Hadoop cluster
  • describe the purpose of application modeling
  • optimize memory and benchmark a Hadoop cluster
  • Course Number:
    df_ahop_a09_it_enus

    Expertise Level
    Expert