Operating Hadoop Clusters

Operating Hadoop Clusters

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
Hadoop is a framework written in Java for running applications on large clusters of commodity hardware. In this course we will examine many of the HDFS administration and operational processes required to operate and maintain a Hadoop cluster. We will take a look at how to balance a Hadoop cluster, manage jobs, and perform backup and recovery for HDFS. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Administrators looking to expand their skill and knowledge surrounding operational activities of Hadoop clusters

Prerequisites
None

Expected Duration (hours)
2.9

Lesson Objectives

Operating Hadoop Clusters

start the course

monitor and improve service levels

deploy a Hadoop release

describe the purpose of change management

describe rack awareness

write configuration files for rack awareness

start and stop a Hadoop cluster

write init scripts for Hadoop

describe the tools fsck and dfsadmin

use fsck to check the HDFS file system

set quotas for the HDFS file system

install and configure trash

manage an HDFS DataNode

use include and exclude files to replace a DataNode

describe the operations for scaling a Hadoop cluster

add a DataNode to a Hadoop cluster

describe the process for balancing a Hadoop cluster

balance a Hadoop cluster

describe the operations involved for backing up data

use distcp to copy data from one cluster to another

describe MapReduce job management on a Hadoop cluster

perform MapReduce job management on a Hadoop cluster

plan an upgrade of a Hadoop cluster

write and complete a plan to install Hbase with high availability

Course Number:
df_ahop_a06_it_enus

Expertise Level
Expert