Operating Hadoop Clusters


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware. In this course we will examine many of the HDFS administration and operational processes required to operate and maintain a Hadoop cluster. We will take a look at how to balance a Hadoop cluster, manage jobs, and perform backup and recovery for HDFS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Administrators looking to expand their skill and knowledge surrounding operational activities of Hadoop clusters

Prerequisites
None

Expected Duration (hours)
2.9

Lesson Objectives

Operating Hadoop Clusters

  • start the course
  • monitor and improve service levels
  • deploy a Hadoop release
  • describe the purpose of change management
  • describe rack awareness
  • write configuration files for rack awareness
  • start and stop a Hadoop cluster
  • write init scripts for Hadoop
  • describe the tools fsck and dfsadmin
  • use fsck to check the HDFS file system
  • set quotas for the HDFS file system
  • install and configure trash
  • manage an HDFS DataNode
  • use include and exclude files to replace a DataNode
  • describe the operations for scaling a Hadoop cluster
  • add a DataNode to a Hadoop cluster
  • describe the process for balancing a Hadoop cluster
  • balance a Hadoop cluster
  • describe the operations involved for backing up data
  • use distcp to copy data from one cluster to another
  • describe MapReduce job management on a Hadoop cluster
  • perform MapReduce job management on a Hadoop cluster
  • plan an upgrade of a Hadoop cluster
  • write and complete a plan to install Hbase with high availability
  • Course Number:
    df_ahop_a06_it_enus

    Expertise Level
    Expert