Data Silos, Lakes, and Streams: Data Lakes on AWS

Data Silos, Lakes, and Streams: Data Lakes on AWS

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

This course discusses the transition of data warehousing to cloud-based solutions using the AWS (Amazon Web Services) cloud platform. In 11 videos, the course explores how data lakes store data using a flat structure, and the data are tagged, making it easy to search and query. You will learn how to build a data lake on the AWS cloud by storing data in S3 (simple storage service) buckets. You will learn to set up your data lake architecture lake using AWS Glue, a fully managed ETL (extract, transform, load) service. You will learn to configure and run Glue crawlers, and you will examine how crawlers merge data stored in an S3 folder path; and to use S3 to generate metadata tables in Glue. Learners will use Athena, Amazon's interactive query service as a simple way to analyze data in S3 using standard SQL. Finally, you will examine how to merge the data crawled by our CSV (comma separated values) crawler into a single table.

Expected Duration (hours)
1.2

Lesson Objectives

Data Silos, Lakes, and Streams: Data Lakes on AWS

Course Overview

configure a custom role with specific permissions on AWS

create an S3 bucket and upload files

recognize the different operations that can be performed using the AWS Glue console

create metadata tables in Glue using the web console

perform queries on the Glue data catalog using Athena

perform data crawling on S3 to automatically detect schemas

execute queries on data in crawled tables

perform crawling operations with multiple files in the same path

merge data stored in multiple files in the same folder path

merge data when files have the exact same schema

recall the roles and features of the different AWS services used in the data lake architecture

Course Number:
it_dsdslsdj_02_enus

Expertise Level
Intermediate