It seems to be a very long time ago since we put to bed the arguments over the need for data warehouses. I remember the first project I worked on: a Regional Insurance Company struggling to get a comprehensive view of their customer demographics for Campaign Management. Why were they struggling? Simply put, their data was in silos across a multitude of different operational systems, each of which had its own unique way of storing (or in some cases hiding) its data.
“Our challenge was obvious; how do we find all the relevant data items and combine them so that we can answer the questions that the organisation has?”
The IT team had already tried to solve this problem in the usual way you would expect. Yes, that’s right, extracts or dumps from each system manually pulled into Excel sheets to produce reports.
How could they gain a holistic and comprehensive view and remove the huge effort and risks inherent in managing, combining and reporting all of these separate data sets?
The answer was to build them a Kimball style dimensionally modelled data warehouse with a reporting layer on top. Back in early 2004 this meant a Microsoft SQL Server 2000 data warehouse and the combination of Crystal Reports and Crystal Info for report creation, management, scheduling and distribution. Sounds familiar to some of you?
Except that it wasn’t that simple.
We used a combination of MS SQL DTS (Data Transformation Services) and T-SQL Stored Procedures as our Extraction, Transformation and Loading (ETL) tool. This made loading the data into SQL Server easier, but we still had issues connecting to the source systems. Where possible, we would do an incremental load, ie, just loading the changed records.
Once the datasets were in our database we passed them through a number of staging areas. This cleansed, related, combined, aggregated and differentiated the data we required for our historical non-volatile reporting models. Unfortunately, this resulted in us having to store multiple copies of the base data.
Once the data model was ready, we applied a combination of Indexes and Stored Procedures in order to deliver the level of performance required. This performance tweaking was necessary as a result of having to update the warehouse and distribute the core reports within the customer’s already constrained overnight batch processing window.
With the model optimised and populated, the reporting schedule kicked in to run, render and deliver via email whichever daily, weekly, or monthly reports we required.
So far so good. The biggest challenge was however when someone asked for a change to an existing report, or a new report, or doomsday scenario, a new report that required a completely new set of data!
Some of our past BI implementations are still productive after almost 15+ years and with evolving business requirements, changes can range from simple report modifications to revamping semantic layer or the underlying data warehouse design.
Why is this a problem?
I speak to many customers from organisations of all shapes and sizes. Both from public and private sectors. Highly regulated or not. I hear time and time again complaints about a lack of access to accurate, timely, relevant, trusted information. Information that is required to support or drive decision making. The so called path from transaction, to information, to insight to action.
“Would it surprise you to know that most of those complaining already have a data warehouse?”
A data warehouse that satisfied their original requirements when built, but as sure as night follows day their organisations information demands changed.
This is a fact of life and something that we have to recognise.
Many existing data warehouses and the teams that maintain them struggle to keep up with changes to data sources. Structural changes to existing sources or entirely new data sets from newly implemented systems or as a result of M&A activity. These changes need to be addressed whilst also trying to meet the evolving and often challenging analytical needs of the organisation.
Decaying data warehouses are like set concrete, too rigid. Many organisations are unable to bring together diverse (structured and unstructured) data types. They suffer from time-consuming reporting, experience data quality issues, and most importantly can’t answer forward-looking questions. The business will look for more agile ways to obtain the information it needs.
“All this does is enforce the opinion that the “old data warehouse” no longer meets the needs of the users.”
The 5 challenges with a traditional data warehouse
We have built many data warehouses and have first-hand experience of helping customers evolve their data warehouse platforms.
Time and again we hear about issues and challenges related to existing data warehouses, these are best summarised as follows:
- COMPLEX ARCHITECTURE
- SLOW PERFORMANCE
- OLD TECHNOLOGY
- LACK OF GOVERNANCE
“The reality is that business needs have changed and technology has moved on. We feel that it is time to adopt the concept of a Modern Data Platform.”
The Modern Data Platform
The rest of this blog series will explore in detail how the Modern Data Platform has evolved to address each of these challenges.
We will also explore best practice development and implementation techniques that have evolved to support their successful deployment.
You can also expect to hear real world, cross industry, practical examples of the Modern Data Platform in action.