Site Reliability Engineering: Scenario Planning


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Scenario planning helps site reliability engineers strategically prepare for uncertainties that may disrupt or negatively affect services. In this course, you'll explore scenario planning use cases and the strategies utilized to prepare for disasters.

You'll examine the functions of Disaster Recovery Testing (DiRT) and Customer Reliability Engineering teams, which help manage the impact of a disaster or disruption. Next, you'll identify disaster recovery testing events and recognize how to plan and design tests for DiRT.

You'll move on to describe the production incident lifecycle and how to minimize production incidents. You'll identify unmanaged responses, how to rectify untrained responses, and the activities used to train response teams.

Finally, you'll examine how to test people and how they self-organize and interact using various role-playing and test scenarios.



Expected Duration (hours)
1.2

Lesson Objectives

Site Reliability Engineering: Scenario Planning

  • discover the key concepts covered in this course
  • define scenario planning and identify why it should be part of your strategic plan
  • describe how to use scenario planning and how to create scenarios
  • recognize considerations when scenario planning for a disaster
  • identify potential scenarios to test and prepare for, such as the loss of technical infrastructure or environmental issues
  • list common data-related disaster recovery scenarios to plan for
  • list common applications-related disaster recovery scenarios to plan for
  • provide an overview of disaster recovery testing events and how they can help identify vulnerabilities in critical systems
  • list what to test when designing tests for DiRT
  • recognize how to minimize the potential damage of disruptive DiRT tests
  • provide an overview of the DiRT technical team and the coordination team
  • list common components of a DiRT test plan and how creating a template is useful for future test plan proposals
  • outline the functions of a Customer Reliability Engineering team and their role in scenario planning
  • outline the production incident lifecycle and how to lay the foundations to shrink production incidents
  • provide an overview of unmanaged responses
  • describe how to rectify untrained responses
  • recognize hands-on activities used to train response teams
  • describe how DiRT exercises should also test how people organize themselves and interact with each other
  • provide an overview of the "Wheel of Misfortune" role-playing scenario
  • provide an overview of the Dungeon/Scenario Master and their role in running a test scenario
  • summarize the key concepts covered in this course
  • Course Number:
    it_srescpldj_01_enus

    Expertise Level
    Intermediate