May 27, 2025 5 min read

🧑‍🤝💻 Building with Intent II: CAPS

Luke Curtis

Luke Curtis

Engineering Leader

Header image

What

Software quality isn’t just about shipping features fast, it’s about building trust through reliability. The CAPS framework (Correctness, Availability, Performance, Security) helps engineering leaders define and measure technical health in a way that aligns with business outcomes.

Much of this may feel like common sense—but CAPS gives teams the data and structure to consistently uphold those best practices.

A solid CAPS strategy is really beneficial to ensuring strong trust and transparency with stakeholders

The following are stages for considering each area of the CAPS strategy in a maintainable state. Using the Six Sigma DMAIC model is a good lens to look through this

Define - What are the definitions of success for this particular area of the strategy, what should we be looking at?

Measure - How are we measuring it? What infrastructure do we need to support measuring this?

Analyse - We now have the data we need, what common themes are we seeing? What have we learnt since we defined our metrics?

Improve - Iterating further on the results of the previous step we can make tweaks to how we measure

Control - Finally, we’re in a position to set standards of what the boundaries of a successful CAPS strategy looks like for the team.

How

Each area of caps would require an in depth look into “what” they mean for the business, however as a general rule of thumb, the below themes have helped me define what that looks like in a technical context.

Strategy Themes
Correctness Error rates, failed business logic paths, and regressions in core user flows
Availability SLO, SLA & SLI indicators ( e.g. uptime monitors, synthetic tests)
Performance P95, P75s, throughput, load testing, outages
Security CVEs, Incident management and resolution strategy, Audit-ability, Security reviews, Penetration testing

CAPS isn’t just a checklist—it’s a mindset. By investing in these four pillars early, engineering teams can move faster with confidence, reduce firefighting, and build systems that scale

Availability Bonus Content

SLOs, SLAs and SLIs

Thinking a little bit deeper about Availability is something I've found to offer really solid insights into the over all health of the engineering efforts your team undertakes, so I've opted to extend this post slightly to go a bit deeper on SLO, SLAs, and SLIs.

Service Level Objectives (SLOs), Service Level Agreements (SLAs) & Service Level Indicators (SLIs) all holistically come together to perform one key wider theme, availability. By using the nuances of each of these different types of metrics a team (and wider observers) can have a fine tuned understanding of the health of the systems they are operating.

There is already a multitude of documentation surrounding what these could look like for businesses and also how to get there. One such website is SLODLC who, personally for me, is a great starting point to get “sensible defaults” for and have really tight feedback loops for defining what success looks like through this lens.

A quick TL;DR of each and the nuances of the three definitions

SLI - The measurable indicator (e.g. order processing latency) - An indicator of a part of your system to make up what constitutes the SLO. An example might be if you owned a e-commerce store, how long does it take to process an order once payment has been made?

SLO - The internal target (e.g. 99% of orders processed < 1 minute) - An objective (usually a %) that is agreed by the whole team to consistently hit for uptime or delivery of a feature. An example might be, taking the e-commerce store one step further, all order processing must be performed within 1 minute. Any deviation above this effects the % of the SLO, if 99 orders processed under 1 minute, but one order took 2, the SLO is running at 99% for the period.

SLA - The external commitment, often contractual (e.g. 97% uptime guarantee) - Using the SLOs you make an agreement with key stakeholders to guarantee a certain level of service, breaches of this are usually monetary fines, discounts or in the worst cases grievance procedures. An example of this would be a contract in place saying that 97% of all orders will be processed under a minute for your e-commerce store.

What does implementation look like?

There’s no one-size-fits-all implementation. Tools like Datadog, New Relic, or custom dashboards can support CAPS strategies—but the real value comes from teams engaging with the data and adjusting as they grow.

Luke Curtis

Luke Curtis

Engineering Leader with over 10 years of experience in building and leading high-performing teams. Passionate about transforming organizations through technical excellence and empowered engineering cultures.

Stay Updated

Subscribe to receive the latest insights and articles directly in your inbox.