SRE Data Pipelines & Integrity: Data Pipelines


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Site reliability engineers often find data processing complex as demands for faster, more reliable, and extra cost-effective results continue to evolve. In this course, you'll explore techniques and best practices for managing a data pipeline. You'll start by examining the various pipeline application models and their recommended uses. You'll then learn how to define and measure service level objectives, plan for dependency failures, and create and maintain pipeline documentation. Next, you'll outline the phases of a pipeline development lifecycle's typical release flow before investigating more challenging topics such as managing data processing pipelines, using big data with simple data pipelines, and using periodic pipeline patterns. Lastly, you'll delve into the components of Google Workflow and recognize how to work with this system.



Expected Duration (hours)
1.2

Lesson Objectives

SRE Data Pipelines & Integrity: Data Pipelines

  • discover the key concepts covered in this course
  • describe the characteristics of and rationale for using data processing pipelines
  • recognize characteristics of the Extract Transform Load (ETL) pipeline model
  • define business intelligence and data analytics in the context of data processing and give an example data analytics use case
  • list characteristics of machine learning (ML) applications
  • define what is meant by service-level objectives (SLOs) and describe how they relate to pipeline data
  • outline how to plan for dependency failures
  • recognize how to create and maintain pipeline documentation
  • outline the stages of a typical development lifecycle
  • describe how to reduce hotspotting
  • recognize how to implement autoscaling to handle spikes in workloads
  • describe how to adhere best to access control and security policies
  • plan escalation paths that ensure quick and proactive communication
  • describe the effect big data can have on simple pipeline patterns
  • list the challenges with using the periodic pipeline pattern
  • describe the issues that can occur due to uneven work distribution
  • list the potential drawbacks of periodic pipelines in distributed environments
  • describe what comprises Google Workflow and outline how it works
  • outline the stages of execution in Google Workflow, describing what they entail
  • recognize the key factors to ensuring business continuity in big data pipelines using Google Workflow
  • summarize the key concepts covered in this course
  • Course Number:
    it_sredpindj_01_enus

    Expertise Level
    Intermediate