Continuous Automation - The Reliability Engineering Edition
Scott Griffiths / March 19, 2021
6 min read •
Question
Whats the minimum amount of testing can we do to provide the most coverage?
This is a look at how a solid Performance Engineering strategy that uses Reliability principles and DevOps idealisms to complement and strengthen current or proposed performance initiatives
These approaches attempt to achieve better business cohesion, reliability and velocity benefits. To do this we can look at applying various methodologies from Performance Engineering using a Shift left and Move Right approaches that extend Traditional Performance Testing techniques
At Its Core, to Understand an Application's Performance We Need
- A mechanism to run load against an application or system
- A way of measuring how they performed
- A way of comparing the results against what we believe is the ideal state
Each area of performance within the DevOps model has its part to play. That is, they all relate in some shape or form to the principles around building, defining and maintaining a reliable system
In a Nutshell#
Each Performance execution and analysis piece should look to be guided by the Engineering Efficiency, DevOps and Reliability principles that apply to software development
Reliability Engineering(RE) attempts to predict and prevent the risk of there being a failure whether that be a component or an entire system of services
Performance Engineering(PE) states we should start earlier in the SDLC to get faster feedback, but also extends into Operations and Support to use real world data to build/update of the performance models (scripts and analysis)
Performance Testing (PT) is all about determining what the performance of an application is (baselining) or comparing to how you believe it should be(delta analysis) under various conditions and situations in the 'test' environment
A Look at Performance Engineering#
PE looks incorporate the methodologies of 'Agile' and use these in conjunction with 'DevOps' idealisms in order to provide a improved approach that adds value rather than one that tends to hinder delivery velocity
We can do this by looking at adopting a left shift / move right approach that incorporates a cloud first performance automation approach. This can then lead to reduced feedback cycle (velocity increase) and bottlenecks / bugs being caught early on (reliability increase).
The Performance Engineering Model#
PE is all about applying process and strategies at each step of the SDLC, the following are example actions/options that can be applied within each vertical
The idea being that performance is a consideration at each step in the software lifecycle, The captured metrics are gathered from Dev, Test, Deploy and Operations and used to refine the next cycle of performance
Traditional Performance Testing#
Quite often done within the test phase and entails a big bang approach that consists of many pods/VM's to generate load against an application/system
Pro's | Con's |
---|---|
Simulates real world conditions as closely as possible | Often a integrated(shared) environment which can affect results |
Integrated tests execute against multiple components at once | Data is often 'test' data which could affect behaviour/results |
Tools can replicate thousands (if not more) of users | Replicating 'Prod' environments can be expensive |
Extensive metrics/reports from tool | Finding route case when diagnosing issues can be complex |
Commercial Tooling can be expensive to operate item |
Performance/Reliability Options to Improve Efficiencies, Engagement and Observability#
--> We can attempt to find this out using combination PE, RE and DevOps principles and methodologies
Shift Left Approach#
Reducing the SDLC feedback loop to uncover and rectify potential system and environment issues early
Shift Left Benefits#
Item | Description |
---|---|
Team cohesion | Foster developer engagement and contribution |
Less bugs | Reduced development costs |
Improved performance | Detect and eliminate bottlenecks sooner |
Reduced risk | Find bugs and performance issues earlier |
Speed up time-to-market | Having more trust in your applications and infrastructure |
Move Right Approach#
A "Move Right" approach extends testing out to include user feedback and metrics from your production environment. This can then be used to update the performance model that's developed as a consequence
Move Right Benefits#
Item | Description |
---|---|
Increased User experience | Tests closely match the actions expected by your users |
Responding faster | Teams have more involvement and ownership over the performance information thats presented back |
Design hypothesis evaluated | Assumptions are reflected upon and adequate action can be taken |
Various performance management options | Many different tools for being able to change traffic flows that can alter performance |
Measurements and Observability#
The use of performance metrics from each environment (Dev/Test/Prod) are used to determine whether they are within SLO's limits.
Idea being we can understand and easily record local (component) and integrated(end 2 end) metrics to provide better performance transparency. These then would be compared to ideal state
These SLO's can be enforced through the use of SLI's (SLI specifications and SLI implementations) and compared to our error budget to measure tolerance
With the view to obtain an current state view of our applications performance in each environment and at each stage of the SDLC these are then compared against our business performance exceptions defined in the SLO and enforced in the SLI
Performance Sli Implementations Could Include:
- API / UI response times
- DB transaction times
- Pod / VM scaling events
- CPU use / Network activity / Memory usage
Could all be defined and compared using SLI's
A subset of the performance suite can be used to poke test (performance smoke test) the application after deployment. A degraded Performance run could then trigger a rollback
Summary#
A balanced performance strategy that is applied at each stage of the SDLC, that uses guidance from RE principles provides a more well rounded verification process and in turn lead to a culture of empathy, encourage collaboration, reduce delivery cycle duration and mitigate the chance of deploying underperforming software