Data Warehouse Development With Scrum

Data Warehouses (DW) are developed by many organizations as a resource for Business Intelligence (BI). Business intelligence refers to the set of capabilities for the collection, storage and analysis of data produced by a company’s activities. BI tools and applications provide actionable reporting, performance measures (KPIs), and trends that inform management decision-making. A Data Warehouse is considered a core component of a Business Intelligence system.

Data Warehouse development is fraught with uncertainty and risk – there are too many variables and too much uncertainty to proceed in the traditional phased approach, where all of the requirements are fully understood and defined, followed by all of the data models and so on. These challenges are exactly what Scrum is designed for – a Product Goal that is a moving target, with accompanying high levels of risk and uncertainty. Scrum provides an approach based on proceeding incrementally with frequent ‘Inspect and Adapt’ cycles. Some though have adopted all of the trappings of Scrum with ‘sprints’ and ‘stand-up’s and so on, while delivering no discernible value each sprint. This is almost pointless.

A typical DW/BI system looks like the following:

Data Warehouse Basic Flow — DW/BI Data Flow

The basic flow through the system is as follows:

Source data ingestion to Staging Area
Extraction, cleaning, conforming of data
Transformation of data into DW Data Model
Data Visualization by various BI applications

The traditional development approach is a sequence of waterfall-like phases:

Data Warehouse Development — Traditional DW Development Model

Our challenge is how do we accomplish this in an Agile way? How do you deliver meaningful business value incrementally? What do Backlog Items look like for a Data Warehouse program?

Can we get closer to an approach that produces usable business value as the output of very sprint? What does a ‘vertical slice’ of value look like?

Agile Data Warehouse Development — Agile DW Construction

What is proposed here is that we tackle one business process at a time. Too much work to get done in a single sprint? Perhaps. Depends on the complexity of the business process. Here’s a BI dashboard:

Yes, quite a lot of data, associated dimensional models, ETL and BI development work, could be needed to support everything on this dashboard. Is it possible to slice this further while still getting us a piece of usable functionality in a single sprint (or at least something we can demonstrate to stakeholders to get feedback)? Let’s say the above dashboard requires 20 Dimensional Tables, 5 Fact Tables, and 30 ETL scripts. The above dashboard has 8 individual elements or components. Can we get one specific component done in a single sprint?

Even if it takes more than one sprint, its still more agile to solve for one business process at a time.

Another frequent challenge is that too much foundational infrastructure is required before we can even begin building screens and dashboards. Can we build the foundation incrementally in ‘vertical slices’ where we can demonstrate value, get feedback and iterate? For example can we identify the simplest dashboard component requiring the least amount of source data and the least complex transformation work, and display a set of records on a web page (no BI development – no charts or graphs)? This is still valuable in that it provides the opportunity to confirm with stakeholders that we have the right data and are looking at the right metrics. One could ‘Story Map’ each business process, identifying all of the work items needed for each part of the DW/BI system – source data, data models, ETL scripts – then sprint planning, goal-by-goal, to get to a full end-to-end solution.

Data Warehouse ‘Stories’

Nowhere in the Scrum Guide (the official definition of Scrum by Ken Schwaber & Jeff Sutherland), will you find any reference to the term “User Story”. That’s because User Stories are an example of a domain-specific method or tactic, and the guide states that “Such tactics for using within the Scrum framework vary widely and are described elsewhere”. Scrum is a framework for solving problems incrementally and iteratively based on the 3 pillars of Transparency, Inspection and Adaptation. The Guide deliberately avoids detailed implementation prescriptions and leaves us with a lightweight framework, defined in around 12 pages. The framework is intended to be extended and evolved to meet the specific needs of individual teams, organizations and their business domains.

The “As a User… I want to… So That…” format works great for user-facing software applications, but frequently teams are tasked with building things where the output of the thing being built is consumed by another system – not a human user. One could, in Jeff Patton’s words, “anthropomorphize” the consuming system, for example, As a Power BI Application I need Dimension Tables A, B, C so that I can construct the Conversion Rate KPI trend report. That’s OK, but it might be more straightforward to capture the backlog item as:

Create Dimension Tables A, B, C so that the Conversion Rate KPI Trend Report can be produced.

Here we have simply omitted a “user”, but have made sure that the reason for this particular item is understood.

Whether we call these items “Stories” or “Data Stories” or simply Product Backlog Items (PBIs) is less important than ensuring that they support incremental value delivery within a one-sprint timebox. INVEST still applies.

Data Warehouse Development With Scrum

Data Warehouse ‘Stories’

Maximum Scrum Team Size

Delivery At Scale 5: Improvement

Scrum Of Scrums

Agile Data Science – Part 1

Delivery At Scale 3: Managing the Product Backlog at Scale

Delivery At Scale 1: The Scaling Challenge

Data Warehouse ‘Stories’

Similar Posts