Data Lake: Architectures & Data Management Principles

Data Lake: Architectures & Data Management Principles

Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level

Overview/Description

A key component to wrangling data is the data lake framework. In this 9-video Skillsoft Aspire course, learners discover how to implement data lakes for real-time management. Explore data ingestion, data processing, and data lifecycle management with Amazon Web Services (AWS) and other open-source ecosystem products. Begin by examining real-time big data architectures, and how to implement Lambda and Kappa architectures to manage real-time big data. View benefits of adopting Zaloni data lake reference architecture. Examine the essential approach of data ingestion and comparative benefits provided by file formats Avro and Parquet. Explore data ingestion with Sqoop, and various data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes. Learn how to derive value from data lakes and describe benefits of critical roles. Learners will explore steps involved in the data lifecycle and the significance of archival policies. Finally, learn how to implement an archival policy to transition between S3 and Glacier, depending on adopted policies. Close the course with an exercise on ingesting data and archival policy.

Expected Duration (hours)
0.6

Lesson Objectives

Data Lake: Architectures & Data Management Principles

discover the key concepts covered in this course

implement Lambda and Kappa architectures to manage real-time big data

identify the benefits of adopting Zaloni data lake reference architecture

describe data ingestion approaches and compare Avro and Parquet file format benefits

demonstrate how to ingest data using Sqoop

describe the data processing strategies provided by MapReduce V2, Hive, Pig, and Yam for processing data with data lakes

recognize how to derive value from data lakes and describe the benefits of critical roles

describe the steps involved in the data life cycle and the significance of archival policies

implement an archival policy to transition between S3 and Glacier, depending on adopted policies

ingest data using Sqoop and implement an archival policy to transition from S3 to adopted policies

Course Number:
it_dsdlipdj_02_enus

Expertise Level
Intermediate