Data Repository with Flume


Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description
Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer. In this course, you'll learn about the theory of Flume as a tool for dealing with extraction and loading of unstructured data. You'll explore a detailed explanation of the Flume agents and a demonstration of the Flume agents in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
2.0

Lesson Objectives

Data Repository with Flume

  • start the course
  • describe the three key attributes of Flume
  • recall some of the protocols cURL supports
  • use cURL to download web server data
  • recall some best practices for the Agent Conf files
  • install and configure Flume
  • create a Flume agent
  • describe a flume agent in detail
  • use a flume agent to load data into HDFS
  • identify popular sources
  • identify popular sinks
  • describe Flume channels
  • describe what is happening during a file roll
  • recall that Avro can be used as both a sink and a source
  • use Avro to capture a remote file
  • create multiple-hop Flume agents
  • describe interceptors
  • create a Flume agent with a TimeStampInterceptor
  • describe multifunction Flume agents
  • configure Flume agents for mutliflow
  • create multi-source Flume agents
  • compare replicating to multiplexing
  • create a Flume agent for multiple data sinks
  • recall some common reasons for Flume failures
  • use the logger to troubleshoot Flume agents
  • configure the various Flume agents
  • Course Number:
    df_ahec_a04_it_enus

    Expertise Level
    Intermediate