Reliability Engineering - Chaos, Performance & Automation
Scott Griffiths / March 15, 2021
3 min read •
Question#
How do we define what constitutes a reliable system?
Can we use Reliability engineering idealisums combined with Chaos, Performance and automation execution strategies to build what we believe defines reliable a system
We take a look at how we can get easy reliability wins by applying different DevOps and SRE methodologies in combination with CE/PE and automation strategies
An Intro to the World of S.R.E#
Reliability Engineering or more commonly knows as Site Reliability Engineering within the likes of Google, Facebook, Netflix and Linkedin (and others)
- Its about focusing on balancing risk and its effect on teams and business velocity. Providing the right automation strategies to supply the business with the right observability confidence and insights to make critical business decisions
- Within SRE this can be achieved (for the most part) through the use of observability metrics (SLI’s), internal and external promises (SLO/SLA’s) and team & business based error budgets
The Engineering Efficiency / Effectiveness Verticals (Alignment)#
Efficiency is doing things right, effectiveness is doing the right things and adaptability isresponding quickly to a changes in business circumstances
The idea is to allow engineers to focus on what’s important by automating or eliminating those items that are slowing down the ability to development product
This incorporates people, process and tech with this goal of reducing barriers, provide better value, increase velocity while still promoting a culture of empathy, accountability and transparency
Using Devops Practises and Methodologies#
That are measured, enforced and verified by Reliability, Chaos and Performance Engineering principles
DevOps emerged as a culture and a set of practices that aims to reduce the gaps between development and software operation
RE defines the overall behavior of the system, with how this is implementation being left up to the engineer
The Devops / Reliability Relationship#
Performance Engineering#
By adopting a cloud first performance automation approach we can look at benefitting from a reduced feedback cycle (velocity increase) and bottlenecks / bugs being caught early (reliabilty increase).
In order to get the benefits we need a multi prong approach that uses components and methodologies from both Performance testing and Performance Engineering
Traditional Performance Testing, Done in the 'Test' Phase#
Performance Engineering, Utilising a Left and 'Measure Everything' Approach#
Releases#
Release frequency with small changes by supporting these releases with the right amount of automated checks and other automation configurations to provide some level of understanding of environment and application behaviour
Forget self managed teams, Aim for a self managed business with an approach to development the enables devs to deliver more efficiently, effectiveless and in an environment that fosters owernership and accountability