Production-Ready Iterations

The first agile principle states: Our highest priority is to satisfy the customer through early and continuous delivery of valuable software. Software development teams aspire to be able to produce increments of production-ready code in every iteration, where ‘production-ready’ means: 100% unit tested, 100% feature tested, 100% system/performance/stress tested end-to-end in a production-like environment, and zero open bugs. (That’s a reasonable definition of ‘done’). Good enough to be in production. And of it course goes without saying that the new increment provides ‘value’ to an actual user of the product.

This sounds like a very tall order, requiring a fundamentally different approach to the traditional practice of a series of hand-offs from one stage of development to the next. There is simply not enough time for that approach to work if we accept the above-stated objective.

Furthermore, the Scrum framework by itself does not supply all of the required processes to support this objective.

The Agile Value Proposition
Agile vs. Waterfall: The Agile Value Proposition

This chart underscores the fundamental difference between agile and waterfall in terms of the test and bug-fix approach. In agile there are no hand-offs, no phases and no large backlogs of bugs to fix as a release nears the end of its development life-cycle. Each iteration strives to deliver an increment of bug-free, production-ready code as its output.

Let’s break this down to see what is required for this to work in practice. Agile testing starts as soon as the first User Story is ready for validation (not at the end of the sprint!). But for this approach to have any chance of success, re-work caused by poor quality code must be minimized. By re-work we mean the traditional test and bug-fixing cycle, characteristic of waterfall development, that starts with the hand-off from development to the test organization. Thus, pre-requisites for submitting builds for validation must include:

  • Code clean compiles with all static analysis warnings removed
  • Code reviewed, with all review issues resolved
  • Story has been unit tested, and units tests are automated
  • Code and automated unit test cases checked into build system
  • Build passes all automated unit tests, test coverage is measured, and meets a pre-defined threshold. (Failing to meet coverage threshold will cause build to be rejected).
  • Build passes all predefined ‘Smoke’ tests – an automated functional regression.

Next, the test team verifies any new functionality or user stories in the new build based on their defined acceptance criteria. This should be done in an environment as close to production as possible. If the team has adopted some of the practices we have discussed, then the majority of stories should be passing at this point. The manufacturing analogy is the production ‘yield’, and we should be striving for the highest possible yield, say > 90%. If the yield is low (and the corresponding re-work high), then we need to dig into the reasons for this, identify root causes, and apply corrective actions to drive the yield higher. Clearly, this will not happen overnight, and may require multiple iterations, if not releases, to get there. There are a couple of additional prerequisites that go along with getting to a high first-pass success rate:

  • A continuous integration environment with a high degree of automation of both the unit test and build functional regression (‘smoke’) test.
  • Enforcement of established rules for build quality, ideally via tools and automation
  • A continuous improvement mindset where the team routinely dissects test failures and institutes actions to push the bar higher for first-pass test success.

 Continuous Integration

One of the fundamental goals of agile development is to produce a production-quality increment of valuable user functionality at the end of every iteration. Working backwards from that challenge implies that a number of technical practices need to be in place. These technical practices need to support the organization’s definition of ‘done’ at both the story and sprint level. For example:

User Story Done Criteria:

  • Story designed/coded/unit tested
  • Unit tests automated (using one of the unit test frameworks like JUnit or Jasmine)
  • Tested code checked in and built without errors:
    • Static analysis tests run and passed
    • Automated unit tests run and passed
    • Unit test coverage measured and meets pre-defined threshold
  • Build passes functional regression test suite
  • User story acceptance criteria met
  • Zero open bugs

Sprint Done Criteria

  • All user stories done
  • All system tests executed and passed
  • All performance/stress tests executed and passed
  • All regression tests executed and passed
  • Zero open bugs

How on earth are we expected to accomplish all of this in an iteration lasting a maximum of 2-4 weeks? To make all of this happen, a number of practices must be in place:

  • There is no ‘hand-off’ from the developers to the testers. Story acceptance testing runs concurrently with development. The QA team can begin testing as soon as the first user story has been delivered cleanly through the build system.
  • Re-work must be absolutely minimized. There is simply is no time for the classical back-and-forth between QA and development. The vast majority of user stories must work first time. This can only be accomplished by rigorous unit testing.
  • System-level regression and performance testing must be running continuously throughout the iteration
  • Test cases for new user stories must be automated. This requires resources and planning.
  • All changed code must be checked in, built and tested as frequently as possible The goal is to re-build the system upon every change.
  • Fixing of broken builds must be given the highest priority.

When all of the above is in place we have something referred to as ‘Continuous Integration’. A typical continuous integration configuration is summarized in the following diagram.

Continuous Integration
Continuous Integration

In this system we have setup a CI system such as Hudson – an open source CI tool. Hudson integrates with other CI-related tools from multiple vendors, such as:

  • SCM Systems: Perforce, Git
  • Build Tools: Maven, Ant
  • Unit Testing Frameworks: Junit, XUnit, Selenium
  • Code Coverage Tools: Clover, Cobertura

Hudson orchestrates all of the individual sub-systems of the CI system, and can run any additional tools that have been integrated. Here is a step-by-step summary of how the system works:

  1. Developers check code changes into the SCM system
  2. Hudson constantly polls the SCM system, and initiates a build when new check-ins are detected. Automated units tests, static analysis tests and functional regression tests are run on the new build
  3.  Successful builds are copied to an internal release server, from where they can be tested further by the development team, or
  4. loaded into the QA test environment, where independent validation of new functionality can be performed at system level.
  5. Test results are reported back to the team

Knowing that every change made to an evolving code-base resulted in a correctly built and defect-free image is invaluable to a development team. Inevitably, defects do get created from time to time. However, identifying and correcting these early means that the team will not be confronted with the risk of a large defect backlog near the end of a release cycle, and can be confident in delivering a high quality release on-time.

Setting up a continuous integration system is not a huge investment, and a rudimentary system can be set up fairly quickly, and then enhanced over time. The payback for early detection and elimination of integration problems and software defects dramatically outweighs the costs. Having the confidence that they are building on a solid foundation frees up development teams to devote their energies into adding new features as opposed to debugging and correcting mistakes in new code.

Continuous Testing

So far we have looked at a framework for a continuous integration system that includes executing a suite of automated unit tests on every build and using the results of that testing to determine whether the build is of sufficient quality to proceed with further development and test activities. Ultimately though, we need to have a test environment that assures us that at the end of every iteration we have a production quality product increment.

If we go back to our simple VOD system example from the last chapter we may realize at this point that we could be facing some significant challenges. The goal is to deliver an increment of defect-free functionality with each iteration. To accomplish this requires:

  • Mature software engineering practices that produce clean, reliable code at the unit and individual user story level.
  • An automated unit test framework that grows with every code check-in and can be run as part of a build validation process.
  • Build metrics such as test coverage that help ensure that the growing code base is comprehensively tested, and as close as possible to production-quality all times.
  • The ability to create and execute a set of new feature test cases at system level within the boundaries of a single iteration.
  • A suite of automated regression test cases that can be run against every new build to ensure that work in creating new functionality has not broken existing system behavior, performance  or stability.

Where we have multiple teams contributing to a release an ‘agile release train’ (ART) approach is a good way to synchronize the work, producing production-quality code every sprint:

Agile Release Train
Agile Release Train

For systems of even moderate complexity and those being engineered by multiple teams, this approach assumes full core regression on every iteration, and will take time to evolve to.

During this transition phase the question now becomes what priorities and what trade-offs must be made to maximize the test coverage per iteration and minimize the length of any residual regression or hardening effort. In the VOD system example we might have an independent System Integration and Test (SIT) team which takes the output of each development iteration and subjects it to a combination of new feature, regression testing, interoperability testing with third-party systems, and non-functional testing including performance, capacity and stress. In this structure the output of a development iteration gives us an increment which is:

  • 100% unit tested, unit tests automated
  • 100% user story tested at the subsystem level
  • X% regression test coverage, where X represents the highest priority, highest risk aspects of the system. X will increase over time as the team works to improve their automation coverage.

The increment is then picked up by the SIT team who subjects it to:

  • 100% new feature validation, end-to-end on a fully integrated system
  • 100% performance, capacity, stress and interoperability testing.
System Integration and Test
System Integration and Test

The result: A ‘High Quality’ increment as the output of each SIT cycle. If the teams are using say, a 3-week iteration cycle, this gets us a high quality increment every 3 weeks. If the team cannot produce something they can call ‘High Quality’ with every iteration then they should consider adjusting the iteration length accordingly. If these intermediate increments are being delivered to customer labs for trial of new functionality then this can still be done at relatively low risk. The final SIT cycle can be extended to provide 100% regression test coverage. If a decision is made to deliver one of the increments before the end of the planned release, this can be done by extending the SIT cycle for that increment to provide full test coverage. Over time, the SIT should be working aggressively (backed up of course by appropriate resourcing and funding from their management teams) to maximize the automation of their regression test suites. Some categories of testing, by their nature, require lengthy test cycles, for example, stability testing or testing of high availability system configurations. Other types of tests, for example those that require direct human involvement like video quality validation, are not easily or inexpensively automated. Nonetheless delivering a high quality product increment per-iteration is a goal that should be in reach of most organizations.