Data Silos, Lakes, and Streams: Data Lakes on AWS


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

This course discusses the transition of data warehousing to cloud-based solutions using the AWS (Amazon Web Services) cloud platform. In 11 videos, the course explores how data lakes store data using a flat structure, and the data are tagged, making it easy to search and query. You will learn how to build a data lake on the AWS cloud by storing data in S3 (simple storage service) buckets. You will learn to set up your data lake architecture lake using AWS Glue, a fully managed ETL (extract, transform, load) service. You will learn to configure and run Glue crawlers, and you will examine how crawlers merge data stored in an S3 folder path; and to use S3 to generate metadata tables in Glue. Learners will use Athena, Amazon's interactive query service as a simple way to analyze data in S3 using standard SQL. Finally, you will examine how to merge the data crawled by our CSV (comma separated values) crawler into a single table.



Expected Duration (hours)
1.2

Lesson Objectives

Data Silos, Lakes, and Streams: Data Lakes on AWS

  • Course Overview
  • configure a custom role with specific permissions on AWS
  • create an S3 bucket and upload files
  • recognize the different operations that can be performed using the AWS Glue console
  • create metadata tables in Glue using the web console
  • perform queries on the Glue data catalog using Athena
  • perform data crawling on S3 to automatically detect schemas
  • execute queries on data in crawled tables
  • perform crawling operations with multiple files in the same path
  • merge data stored in multiple files in the same folder path
  • merge data when files have the exact same schema
  • recall the roles and features of the different AWS services used in the data lake architecture
  • Course Number:
    it_dsdslsdj_02_enus

    Expertise Level
    Intermediate