Data Repository with Flume

Data Repository with Flume

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
Hadoop is an open source software project that enables distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with very high degree of fault tolerance. Rather than relying on high-end hardware, the resiliency of these clusters comes from the software's ability to detect and handle failures at the application layer. In this course, you'll learn about the theory of Flume as a tool for dealing with extraction and loading of unstructured data. You'll explore a detailed explanation of the Flume agents and a demonstration of the Flume agents in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
2.0

Lesson Objectives

Data Repository with Flume

start the course

describe the three key attributes of Flume

recall some of the protocols cURL supports

use cURL to download web server data

recall some best practices for the Agent Conf files

install and configure Flume

create a Flume agent

describe a flume agent in detail

use a flume agent to load data into HDFS

identify popular sources

identify popular sinks

describe Flume channels

describe what is happening during a file roll

recall that Avro can be used as both a sink and a source

use Avro to capture a remote file

create multiple-hop Flume agents

describe interceptors

create a Flume agent with a TimeStampInterceptor

describe multifunction Flume agents

configure Flume agents for mutliflow

create multi-source Flume agents

compare replicating to multiplexing

create a Flume agent for multiple data sinks

recall some common reasons for Flume failures

use the logger to troubleshoot Flume agents

configure the various Flume agents

Course Number:
df_ahec_a04_it_enus

Expertise Level
Intermediate