Building Data Pipelines


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Explore data pipelines and methods of processing them with and without ETL (extract, transform, load). In this course, you will learn to create data pipelines by using the Apache Airflow workflow management program. Key concepts covered here include data pipelines as an application that sits between raw data and a transformed data set, between a data source and a data target; how to build a traditional ETL pipeline with batch processing; and how to build an ETL pipeline with stream processing. Next, learn how to set up and install Apache Airflow; the key concepts of Apache Airflow; how to instantiate a directed acyclic graph in Airflow. Learners are shown how to use tasks and include arguments in Airflow; how to use dependencies in Airflow; how to build an ETL pipeline with Airflow; and how to build an automated pipeline without using ETL. Finally, learn how to test Airflow tasks by using the airflow command line utility, and how to use Apache Airflow to create a data pipeline.



Expected Duration (hours)
1.2

Lesson Objectives

Building Data Pipelines

  • Course Overview
  • describe data pipelines and automation
  • build a traditional ETL pipeline with batch processing
  • build a ETL pipeline with stream processing
  • setup and install Apache Airflow
  • describe the key concepts of Apache Airflow
  • create and instantiate a directed acyclic graph in Airflow
  • use tasks and include arguments in Airflow
  • use dependencies in Airflow
  • build an ETL pipeline with Airflow
  • build an automated pipeline without using ETL
  • test Airflow tasks using the airflow command line utility
  • use Apache Airflow to create a data pipeline
  • Course Number:
    it_dsbdpidj_01_enus

    Expertise Level
    Intermediate