Site Reliability Engineer: Managing Overloads


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Site reliability engineers (SREs) are typically responsible for preventing and managing overloads. A common misconception is that overloads only affect computer systems. However, overloads also comprise types of occupational stress, which invariably negatively affect an organization.

In this course, you'll explore the fundamental concepts and methods involved in managing overloads. You'll start by identifying operational load types and how they relate to performance. You'll then outline how to mitigate workloads and prioritize work before recognizing the specific consequences of overloads. You'll then describe how to manage client-side traffic using per customer limitations and client-side throttling. You'll examine tools such as criticality values and utilization signals. Finally, you'll explore approaches used for handling overload errors and learn how to identify issues caused by loads associated with connections.



Expected Duration (hours)
1.2

Lesson Objectives

Site Reliability Engineer: Managing Overloads

  • discover the key concepts covered in this course
  • define what is meant by operational loads, list their types, and describe how they relate to optimal performance
  • outline the purpose of pages and how to manage them
  • recognize the benefits of using tickets
  • outline the activities involved in ongoing operational responsibilities
  • identify how operational overload occurs and name considerations related to operational threshold
  • outline steps to mitigate overloads
  • list the potential consequences of overloads, including serious illness to staff
  • recognize the importance of prioritizing work and tasks
  • recognize the pitfalls of the queries per second metric
  • name capacity options, such as per customer limitations
  • recognize the benefits of client-side throttling
  • define the concept of criticality, name four criticality values, and identify the purpose of criticality and each value
  • describe the purpose and characteristics of utilization signals
  • outline processes for working with overload errors
  • describe mechanisms available to avoid retrying requests, such as per-request retry budget and per-client retry budget
  • outline how counters can help prevent overloads
  • describe how loads from connections can help recognize and prevent overloads
  • identify potential problems caused by new connection bursts
  • summarize the key concepts covered in this course
  • Course Number:
    it_sreolcfdj_01_enus

    Expertise Level
    Intermediate