SRE Troubleshooting: Tools


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Site reliability engineers (SREs) are typically good problem solvers. They need to think logically to identify problems, correct them, and prevent them from happening again.

In this course, you'll explore several built-in and open-source troubleshooting tools SREs can use for resolving system issues. You'll start by examining the techniques of logging and whitebox and blackbox monitoring used to monitor system events. You'll then work with the various built-in Windows troubleshooting tools, namely the Event Viewer, Resource Monitor, and System Information tools.

Next, you'll use Google Cloud Dataflow to process logs, before outlining the purpose and benefits of the StatsD standard and the /api/search endpoint. Lastly, you'll identify how Google's Dapper is used for troubleshooting distributed systems, and the open standards tool, Prometheus, for instrumenting software and exposing metrics.



Expected Duration (hours)
0.7

Lesson Objectives

SRE Troubleshooting: Tools

  • discover the key concepts covered in this course
  • outline the process and purpose of logging and name the benefits of text logs
  • describe the characteristics and purpose of whitebox monitoring
  • describe the characteristics and purpose of blackbox monitoring
  • access and navigate the Windows Event Viewer
  • open the System Information panel in Windows and use it to view and collect system information
  • use Windows Resource Monitor to display real-time hardware and software usage information
  • summarize the characteristics of Dapper and outline how it can be used to troubleshoot a distributed system
  • process logs using the Google Cloud Dataflow workflow tool
  • recognize how the StatsD standard is used for instrumenting software and exposing metrics
  • outline the characteristics, components, and purpose of the Prometheus open source systems monitoring and alerting toolkit
  • outline how to manually send a request to the /api/search endpoint to identify failures
  • summarize the key concepts covered in this course
  • Course Number:
    it_sreeftsdj_02_enus

    Expertise Level
    Intermediate