Data Repository with Sqoop

Data Repository with Sqoop

Overview/Description
Target Audience
Prerequisites
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description
Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Essentially, it accomplishes two tasks: massive data storage and faster processing. This course explains the theory of Sqoop as a tool for dealing with extraction and loading of structured data from a RDBMS. You'll explore an explanation of Hive SQL statements and a demonstration of Hive in action. This learning path can be used as part of the preparation for the Cloudera Certified Administrator for Apache Hadoop (CCA-500) exam.

Target Audience
Technical personnel with a background in Linux, SQL, and programming who intend to join a Hadoop Engineering team in roles such as Hadoop developer, data architect, or data engineer or roles related to technical project management, cluster operations, or data analysis

Prerequisites
None

Expected Duration (hours)
1.4

Lesson Objectives

Data Repository with Sqoop

start the course

describe MySQL

install MySQL

create a database in MySQL

create MySQL tables and load data

describe Sqoop

describe Sqoop's architecture

recall the dependencies for Sqoop installation

install Sqoop

recall why it's important for the primary key to be numeric

perform a Sqoop import from MySQL into HDFS

recall what concerns the developers should be aware of

perform a Sqoop export from HDFS into MySQL

recall that you must execute a Sqoop import statement for each data element

perform a Sqoop import from MySQL into HBase

recall how to use chain troubleshooting to resolve Sqoop issues

use the log files to identify common Sqoop errors and their resolutions

to use Sqoop to extract data from a RDBMS and load the data into HDFS

Course Number:
df_ahec_a05_it_enus

Expertise Level
Intermediate