Category Archives: Acceptance Testing

Video of QASIG Presentation Focused on Managing Quality Debt

I had a great time speaking at the QASIG group meeting last night and met some great folks along with reconnecting with others I haven’t seen in some time. Here is the video of the presentation.

 

Video streaming by Ustream

And the slides

The Stability Index: Focusing on Release Stabilization

While recently working with Juni Mukherjee on a team that is focused on finding ways to extending and increasing the value of a large legacy platform she brought up what I thought was a brilliant idea. We had been working on creating metrics that have tension with each other to drive continuous integration effectiveness from the component level up to the deployed system and continue to generate value to users. Juni’s research and the team discussions on the topic went through multiple scenarios with different metrics to better understand how they balance and/or mislead. As we all know any metric can be gamed. And not only that, but a metric is not always valuable in a context even though it is extremely valuable in another. During the conversation Juni blurted out that what she was looking to come out with was a “Stability Index”. This brilliant phrase along with the outcomes of our team discussions lead me to think that this is a valuable way to look at quality measurements alongside other release constraints to support delivery of continuous value.

This article is a first attempt at putting The Stability Index down on paper as it is already in use, to some degree, in Juni’s organization. In the past, organizations and teams I have worked with have come up with similar approaches that allow us to balance effective quality signals. This should lead to early detection of what are usually weak signals of quality issues that many times are found too late or whose symptoms are seen away from the signals origin.

Goals of The Stability Index

Stability Index is a function of signals that are observed in pre-production and production environments. The purpose of calculating a Stability Index is two-fold:

  • The Stability Index is an indicator of how an organization is progressing towards its business goals. For example, if Stability Index goes up, Cycle Times reduce, Release Stabilization Period goes down and Customer Retention improves.
  • The Stability Index also reveals any correlation of pre-production signals to signals observed in production. For example, when Code Coverage goes up, the number of defects found in production goes down.

If proper balance is given to these pre-production and in production signals than this should lead to a stable application platform that continuous to deliver value at a sustainable rate as new business needs arise. Of course, this does not mean that business needs will be continuous or steady so there is still a potential to impact The Stability Index but by keeping balance, a team or teams should be able to manage fluctuations in business needs more effectively.

Ultimately, creating an implementation of The Stability Index is deciding on metrics that produce effective pre-production and in production early warning signals. The following sections will go into detail about the metrics which were are initially used in this implementation of The Stability Index.

Pre-Production Signals

Pre-production signals are focused towards technical craftsmanship of engineering teams.

Percentage of Broken Builds

This metric is used to objectively measure behavior patterns of engineers who may not be using effective gating criteria before checking in code. Build breakages could be while building source code, compiling, unit testing, distributing artifacts, deploying images or further downstream while testing for functional correctness, integration issues and performance gaps. Irrespective of where the breakage is, code will not be able to flow to production through a fully automated pipeline unless engineers test code changes in their local sandbox environments before checking into the common repository.

 Code Duplication

This metric is an early indicator of software debt that indicates that software programs have significant sequences of source code that repeat and this calls for refactoring. This has a high business impact in terms of maintainability.

Cyclomatic Complexity

This is a software metric that indicates the conditional complexity of a program (function, method, class etc.) by measuring the number of linearly independent paths that can be executed. This is also an early indicator of software debt and is very expensive to the company when it comes to getting new employees up to speed.

Code Coverage

Code Coverage can be either Line/Statement/Decision/Branch/Condition coverage and is a measure of how effective test suites are in certifying software programs. Code Coverage also indicates that 100% Line or Statement Coverage may give a false sense of security since all lines may have been covered by tests although important decisions, branches and conditions may not have been tested.

The caveat here is that a high code coverage percentage does not guarantee bug-free applications since the primary objective of tests should be to meet customer requirements and all customer use cases may not be exercised by tests even when Code Coverage is at 100%.

Test Cycle Time

Test Cycle Time is a measure of the time it takes to execute the entire test suite before either reporting bugs that would cause another iteration or software can be certified for production. For the same number of tests, the Cycle Time could be high unless tests are designed to be independent of each other and are launched in parallel. As an aside, for tests to be launched in parallel, the test environments can often become a bottleneck in terms of providing the required capacity and reliability. Virtual environments may not offer a guaranteed share of the CPU whereas shared clusters in distributed environments may queue jobs and hence make the execution time long and unpredictable.

Production Signals

Production signals are focused towards the customer.

Customer Delight / Satisfaction

It’s all about the customer. Software is released to meet the customer’s needs and to leave the customer delighted and craving for more. Although “Delight” can be subjective at times, surveys are an effective way to measure satisfaction. An example of customer dissatisfaction could be that albeit the software behaves correctly, it takes more number of clicks to perform the same activity than it used to take in the previous version. Defects reported by the customer are a good measure of this metric.

Defect Containment

This is an important trend to watch out for since customers can be inconvenienced if their support tickets are queued up. Moreover, defects reported in production that translate into code and configuration errors should be fixed by the engineering team within acceptable SLAs depending on the severity of the issues. Being able to iterate fast is one of the key factors for customer retention.

Uptime

Systems could go down due to hardware incidents like router malfunction and disk crashes or due to software inadequacies like fault tolerance not being built in. Either way, downtimes cause revenue losses and are a critical contributor towards Stability Index.

Relationship of Signals to The Stability Index

Each of the above trends has a bearing on the Stability Index. Some trends are directly proportional to the Stability Index (), while the trends of others are inversely proportional (). For example, when Code Coverage goes up, it has a positive impact on Stability Index. On the contrary, when Code Duplication goes up, Stability Index goes down.

The relationships of all the metrics with Stability Index are illustrated below.

Pre-production Metric Relationship to Stability Index
Percentage of Broken Builds  
Code Duplication  
Cyclomatic Complexity  
Code Coverage  
Test Cycle Time  
Production Metric Relationship to Stability Index
Customer Delight / Satisfaction  
Defect Containment  
Uptime

Towards a Push-Button Release

This presentation is being delivered as a 45-minute lecture and discussion for a company-wide tech talk today. It contains 2 case studies that revolve around moving to a push-button release that reduced the whole product company’s release cycles from 6 months to every week and the effects of a “No Defect” policy on a team’s productivity.

Managing Software Debt in Practice Presentation

Today at the Scrum Gathering in Seattle, I held a session on “Managing Software Debt in Practice” where we got into:

The presentation had too much for the less than 90 minutes that we had for the session. I did not get into scaling Scrum team patterns and heuristics to manage software debt at scale and also less around testing than I’d hoped. Hopefully it was useful for the participants and they got at least one new idea leaving the session. It is difficult to take a 1-day workshop and create a less than 90 minute talk, as I learn again.

Interview: Assessing ROI of Addressing Software Debt

This week, an interview from SearchSoftwareQuality.com with me came out on the book Managing Software Debt: Building for Inevitable Change. The interview has 2 parts. The first is a discussion of software debt and the second focuses on addressing software debt. Here are links to the interview:

Our Book is Available: Managing Software Debt – Building for Inevitable Change


I am quite happy about the book that took much of my time over the past couple years has finally come out. Thank you Addison-Wesley for asking me to write a book. Also, I want to thank Jim Highsmith and Alistair Cockburn for accepting the book into their Agile Software Development Series. Finally, I have to thank all of those that have guided, influenced, and supported me over my career and life, with special thanks to my wife and kids who put up with me during the book’s development. My family is truly amazing and I am very lucky to have them!

Automated Promotion through Server Environments

To get towards continuous deployment, or even continuous validated builds, a team must take care of their automated build, test, analyze, deploy scripts. I always recommend that you have 2 scripts:

  • deploy
  • rollback

And that these scripts are used many times per iteration to reduce the risk of surprises when deploying to downstream server environments, including production. The following diagram shows a generic flow that these deploy and rollback scripts might incorporate to increase the confidence of teams in the continuous deployment of their software to downstream environments.

The incorporation of continuous integration, automated tests (unit, integration, acceptance, smoke), code analysis, and deployment into the deployment scripts provides increased confidence. Teams that I have worked on, coached, and consulted with have found the use of static code analysis with automated build failure when particular metrics are trending in a negative direction enhances their continued confidence in changes they make to the software. The software changes may pass all of the automated tests but the build is still not promoted to the next environment because, for instance, code coverage has gone down more than 0.5% in the past week. I am careful to suggest that teams should probably not set a specific metric bar like “90% code coverage or bust” but rather that the trend of any important metric is not trending in the wrong direction.

Please let us know how your teams move towards continuous delivery or at least continuous deployment to downstream environments with confidence in the comments section of this blog entry. Thanks.