The Stability Index: Focusing on Release Stabilization

While recently working with Juni Mukherjee on a team that is focused on finding ways to extending and increasing the value of a large legacy platform she brought up what I thought was a brilliant idea. We had been working on creating metrics that have tension with each other to drive continuous integration effectiveness from the component level up to the deployed system and continue to generate value to users. Juni’s research and the team discussions on the topic went through multiple scenarios with different metrics to better understand how they balance and/or mislead. As we all know any metric can be gamed. And not only that, but a metric is not always valuable in a context even though it is extremely valuable in another. During the conversation Juni blurted out that what she was looking to come out with was a “Stability Index”. This brilliant phrase along with the outcomes of our team discussions lead me to think that this is a valuable way to look at quality measurements alongside other release constraints to support delivery of continuous value.

This article is a first attempt at putting The Stability Index down on paper as it is already in use, to some degree, in Juni’s organization. In the past, organizations and teams I have worked with have come up with similar approaches that allow us to balance effective quality signals. This should lead to early detection of what are usually weak signals of quality issues that many times are found too late or whose symptoms are seen away from the signals origin.

Goals of The Stability Index

Stability Index is a function of signals that are observed in pre-production and production environments. The purpose of calculating a Stability Index is two-fold:

The Stability Index is an indicator of how an organization is progressing towards its business goals. For example, if Stability Index goes up, Cycle Times reduce, Release Stabilization Period goes down and Customer Retention improves.
The Stability Index also reveals any correlation of pre-production signals to signals observed in production. For example, when Code Coverage goes up, the number of defects found in production goes down.

If proper balance is given to these pre-production and in production signals than this should lead to a stable application platform that continuous to deliver value at a sustainable rate as new business needs arise. Of course, this does not mean that business needs will be continuous or steady so there is still a potential to impact The Stability Index but by keeping balance, a team or teams should be able to manage fluctuations in business needs more effectively.

Ultimately, creating an implementation of The Stability Index is deciding on metrics that produce effective pre-production and in production early warning signals. The following sections will go into detail about the metrics which were are initially used in this implementation of The Stability Index.

Pre-Production Signals

Pre-production signals are focused towards technical craftsmanship of engineering teams.

Percentage of Broken Builds

This metric is used to objectively measure behavior patterns of engineers who may not be using effective gating criteria before checking in code. Build breakages could be while building source code, compiling, unit testing, distributing artifacts, deploying images or further downstream while testing for functional correctness, integration issues and performance gaps. Irrespective of where the breakage is, code will not be able to flow to production through a fully automated pipeline unless engineers test code changes in their local sandbox environments before checking into the common repository.

Code Duplication

This metric is an early indicator of software debt that indicates that software programs have significant sequences of source code that repeat and this calls for refactoring. This has a high business impact in terms of maintainability.

Cyclomatic Complexity

This is a software metric that indicates the conditional complexity of a program (function, method, class etc.) by measuring the number of linearly independent paths that can be executed. This is also an early indicator of software debt and is very expensive to the company when it comes to getting new employees up to speed.

Code Coverage

Code Coverage can be either Line/Statement/Decision/Branch/Condition coverage and is a measure of how effective test suites are in certifying software programs. Code Coverage also indicates that 100% Line or Statement Coverage may give a false sense of security since all lines may have been covered by tests although important decisions, branches and conditions may not have been tested.

The caveat here is that a high code coverage percentage does not guarantee bug-free applications since the primary objective of tests should be to meet customer requirements and all customer use cases may not be exercised by tests even when Code Coverage is at 100%.

Test Cycle Time

Test Cycle Time is a measure of the time it takes to execute the entire test suite before either reporting bugs that would cause another iteration or software can be certified for production. For the same number of tests, the Cycle Time could be high unless tests are designed to be independent of each other and are launched in parallel. As an aside, for tests to be launched in parallel, the test environments can often become a bottleneck in terms of providing the required capacity and reliability. Virtual environments may not offer a guaranteed share of the CPU whereas shared clusters in distributed environments may queue jobs and hence make the execution time long and unpredictable.

Production Signals

Production signals are focused towards the customer.

Customer Delight / Satisfaction

It’s all about the customer. Software is released to meet the customer’s needs and to leave the customer delighted and craving for more. Although “Delight” can be subjective at times, surveys are an effective way to measure satisfaction. An example of customer dissatisfaction could be that albeit the software behaves correctly, it takes more number of clicks to perform the same activity than it used to take in the previous version. Defects reported by the customer are a good measure of this metric.

Defect Containment

This is an important trend to watch out for since customers can be inconvenienced if their support tickets are queued up. Moreover, defects reported in production that translate into code and configuration errors should be fixed by the engineering team within acceptable SLAs depending on the severity of the issues. Being able to iterate fast is one of the key factors for customer retention.

Uptime

Systems could go down due to hardware incidents like router malfunction and disk crashes or due to software inadequacies like fault tolerance not being built in. Either way, downtimes cause revenue losses and are a critical contributor towards Stability Index.

Relationship of Signals to The Stability Index

Each of the above trends has a bearing on the Stability Index. Some trends are directly proportional to the Stability Index (), while the trends of others are inversely proportional (). For example, when Code Coverage goes up, it has a positive impact on Stability Index. On the contrary, when Code Duplication goes up, Stability Index goes down.

The relationships of all the metrics with Stability Index are illustrated below.

Pre-production Metric	Relationship to Stability Index
Percentage of Broken Builds
Code Duplication
Cyclomatic Complexity
Code Coverage
Test Cycle Time
Production Metric	Relationship to Stability Index
Customer Delight / Satisfaction
Defect Containment
Uptime