SRE Metric Management: Software Reliability Monitoring and Reporting


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Once SRE metrics have been identified, site reliability engineers (SREs) must know how to perform fault analysis on a system, classify defects, and monitor and report data. In this course, you'll explore the tools and best practices for carrying out these procedures.

You'll begin by identifying various fault analysis methods and tools. You'll then classify software defects and bugs with a focus on severity and priority.

Next, you'll investigate strategies for monitoring APIs and explore some tools used for this task. You'll then examine in detail several tools for collecting, analyzing, and reporting metric data using a customizable dashboard, including those that comprise the ELK Stack - Elasticsearch, Logstash, and Kibana. Furthermore, you'll explore the data collection tool Beats and the beneficial use cases for Elasticsearch notifications.



Expected Duration (hours)
1.3

Lesson Objectives

SRE Metric Management: Software Reliability Monitoring and Reporting

  • discover the key concepts covered in this course
  • outline various methods for analyzing the effects of faults in a system
  • outline how to use fault tree analysis to determine the cause of faults in a system
  • name the tools that can be used to perform fault tree analysis
  • outline how to classify software defects
  • describe the various types of software bugs and recognize why they occur
  • differentiate between the severity and priority of software bugs
  • outline best practice when defining API monitoring strategies
  • state the key characteristics of API monitoring strategies    
  • list API monitoring tools and their strengths and weaknesses    
  • identify the components of the ELK Stack and how they work together for data reporting
  • describe the features and benefits of Elasticsearch for storing log data
  • describe the features and benefits of Kibana for viewing data
  • describe the features and benefits of Beats for data collection
  • describe the features and benefits of Logstash for data processing
  • outline how to use Elasticsearch notifications to notify staff when API services have issues    
  • summarize the key concepts covered in this course
  • Course Number:
    it_sremetsdj_02_enus

    Expertise Level
    Intermediate