Final Exam: Chaos Engineer


Overview/Description
Expected Duration
Lesson Objectives
Course Number
Expertise Level



Overview/Description

Final Exam: Chaos Engineer will test your knowledge and application of the topics presented throughout the Chaos Engineer track of the Skillsoft Aspire Network Admin to Site Reliability Engineer Journey.



Expected Duration (hours)
0.0

Lesson Objectives

Final Exam: Chaos Engineer

  • describe challenges for maintaining data integrity
  • describe CRON and how to use it for scheduling jobs
  • describe CRON Jobs and its components
  • describe deterministic and non-deterministic algorithms and how they relate to distributed systems
  • describe frontend load balancing and the importance of using them to increase performance
  • describe how engineers think differently to "novices" when it comes to troubleshooting
  • describe how load balancing needs to be performed taking into consideration virtualization, the cloud, and containers
  • describe how loads can be balanced using External HTTPS Load Balancing
  • describe how loads can be balanced using SSL Proxy Load Balancing
  • describe how loads can be balanced using TCP Proxy Load Balancing
  • describe how server overloads can lead to cascading failure
  • describe load balancing techniques and algorithms
  • describe operational loads and how they related to optimal performance
  • describe steps to ensure efficient queue management
  • describe the benefits of client-side throttling
  • describe the characteristics and purpose of blackbox monitoring
  • describe the characteristics and purpose of whitebox monitoring
  • describe the CRON syntax and provide syntax examples
  • describe the importance of incident response training
  • describe the mean time between failures metric
  • describe the meantime to respond metric
  • describe the system models that can be used with distributed systems
  • describe when to use acceptance testing
  • differentiate between idempotent and two-phase mutations
  • differentiate between the various pipeline features
  • discuss software testing at scale
  • identify the system models that can be used with distributed systems
  • list characteristics of machine learning (ML) applications
  • list CPU considerations as it relates to failures and overutilization
  • list data integrity requirements
  • list potential pitfalls to avoid, such as looking for symptoms that are not relevant
  • list the main roles in incident response (Incident Commander, Communications Lead, Operations Lead)
  • outline an idealized troubleshooting model (e.g., report, triage, examine, diagnose, test/treat, and cure.)
  • outline best practices and approaches to troubleshooting and how to keep those skills sharp
  • outline the benefits of using tickets
  • outline the process and purpose of logging and name the benefits of text logs
  • provide a general overview of the six steps involved in developing a plan
  • provide an overview of a typical development lifecycle
  • provide an overview of backup and recovery methods
  • provide an overview of business continuity and describe why business continuity planning matters
  • provide an overview of data integrity
  • provide an overview of Google Workflow
  • provide an overview of key principles SREs need to be familiar with for emergency response and recognize key steps to take when a system breaks
  • provide an overview of pages
  • provide an overview of resources exhaustion
  • provide an overview of the checkpointing technique
  • provide an overview of the maturity matrix
  • provide an overview of the meantime to failure metric
  • provide an overview of the production readiness review process
  • recognize aspects of the SRE engagement model
  • recognize best practices for handling unmanaged incidents
  • recognize how to create and maintain pipeline documentation
  • recognize how to create a test and build an environment
  • recognize how to develop a launch checklist
  • recognize how to identify cascading failures
  • recognize key factors to ensuring business continuity
  • recognize the importance of encouraging proactive testing
  • recognize the importance of incident response planning
  • recognize the importance of testing SRE-developed tools
  • recognize the pitfalls of the queries per second metric
  • Course Number:
    it_fesre_03_enus

    Expertise Level
    Intermediate