Continuous Automation - The Reliability Engineering Edition

SG

Scott Griffiths / March 19, 2021

6 min read

Container Checks

Question

Whats the minimum amount of testing can we do to provide the most coverage?

This is a look at how a solid Performance Engineering strategy that uses Reliability principles and DevOps idealisms to complement and strengthen current or proposed performance initiatives

These approaches attempt to achieve better business cohesion, reliability and velocity benefits. To do this we can look at applying various methodologies from Performance Engineering using a Shift left and Move Right approaches that extend Traditional Performance Testing techniques

At Its Core, to Understand an Application's Performance We Need
  • A mechanism to run load against an application or system
  • A way of measuring how they performed
  • A way of comparing the results against what we believe is the ideal state

Each area of performance within the DevOps model has its part to play. That is, they all relate in some shape or form to the principles around building, defining and maintaining a reliable system

In a Nutshell#

Each Performance execution and analysis piece should look to be guided by the Engineering Efficiency, DevOps and Reliability principles that apply to software development

The Breakdown

  • Reliability Engineering(RE) attempts to predict and prevent the risk of there being a failure whether that be a component or an entire system of services

  • Performance Engineering(PE) states we should start earlier in the SDLC to get faster feedback, but also extends into Operations and Support to use real world data to build/update of the performance models (scripts and analysis)

  • Performance Testing (PT) is all about determining what the performance of an application is (baselining) or comparing to how you believe it should be(delta analysis) under various conditions and situations in the 'test' environment

A Look at Performance Engineering#

PE looks incorporate the methodologies of 'Agile' and use these in conjunction with 'DevOps' idealisms in order to provide a improved approach that adds value rather than one that tends to hinder delivery velocity

We can do this by looking at adopting a left shift / move right approach that incorporates a cloud first performance automation approach. This can then lead to reduced feedback cycle (velocity increase) and bottlenecks / bugs being caught early on (reliability increase).

The Performance Engineering Model#

PE is all about applying process and strategies at each step of the SDLC, the following are example actions/options that can be applied within each vertical

PE Model

The idea being that performance is a consideration at each step in the software lifecycle, The captured metrics are gathered from Dev, Test, Deploy and Operations and used to refine the next cycle of performance

Traditional Performance Testing#

Quite often done within the test phase and entails a big bang approach that consists of many pods/VM's to generate load against an application/system

PE Traditional

Pro'sCon's
Simulates real world conditions as closely as possibleOften a integrated(shared) environment which can affect results
Integrated tests execute against multiple components at onceData is often 'test' data which could affect behaviour/results
Tools can replicate thousands (if not more) of usersReplicating 'Prod' environments can be expensive
Extensive metrics/reports from toolFinding route case when diagnosing issues can be complex
Commercial Tooling can be expensive to operate item

Performance/Reliability Options to Improve Efficiencies, Engagement and Observability#


--> We can attempt to find this out using combination PE, RE and DevOps principles and methodologies

Shift Left Approach#

Reducing the SDLC feedback loop to uncover and rectify potential system and environment issues early

Shift Left

Shift Left Benefits#

ItemDescription
Team cohesionFoster developer engagement and contribution
Less bugsReduced development costs
Improved performanceDetect and eliminate bottlenecks sooner
Reduced riskFind bugs and performance issues earlier
Speed up time-to-marketHaving more trust in your applications and infrastructure

Move Right Approach#

A "Move Right" approach extends testing out to include user feedback and metrics from your production environment. This can then be used to update the performance model that's developed as a consequence

Move Right

Move Right Benefits#

ItemDescription
Increased User experienceTests closely match the actions expected by your users
Responding fasterTeams have more involvement and ownership over the performance information thats presented back
Design hypothesis evaluatedAssumptions are reflected upon and adequate action can be taken
Various performance management optionsMany different tools for being able to change traffic flows that can alter performance

Measurements and Observability#

The use of performance metrics from each environment (Dev/Test/Prod) are used to determine whether they are within SLO's limits.

Idea being we can understand and easily record local (component) and integrated(end 2 end) metrics to provide better performance transparency. These then would be compared to ideal state

These SLO's can be enforced through the use of SLI's (SLI specifications and SLI implementations) and compared to our error budget to measure tolerance

Observability

With the view to obtain an current state view of our applications performance in each environment and at each stage of the SDLC these are then compared against our business performance exceptions defined in the SLO and enforced in the SLI

Performance Sli Implementations Could Include:
  • API / UI response times
  • DB transaction times
  • Pod / VM scaling events
  • CPU use / Network activity / Memory usage

Could all be defined and compared using SLI's

A subset of the performance suite can be used to poke test (performance smoke test) the application after deployment. A degraded Performance run could then trigger a rollback

Summary#

A balanced performance strategy that is applied at each stage of the SDLC, that uses guidance from RE principles provides a more well rounded verification process and in turn lead to a culture of empathy, encourage collaboration, reduce delivery cycle duration and mitigate the chance of deploying underperforming software

@ Discuss on Twitter