In the previous post, I discussed the first challenge of the traditional data warehouse which is the inflexibility of allowing changes.
This blog will focus on the second of these issues: COMPLEX ARCHITECTURE.
- COMPLEX ARCHITECTURE
- SLOW PERFORMANCE
- OLD TECHNOLOGY
- LACK OF GOVERNANCE
The following diagram is an illustration of a typical implementation architecture for a heterogeneous sourced data warehouse designed to support traditional Business Intelligence workloads. An architecture that should be all too familiar with Data Warehouse/BI practitioners who have been implementing Data Warehouse projects for many years.
Source Data –typically refers to SQL friendly Relational Database Management Systems (RDBMS). If the data is not in structured format, manipulation and transformation will be required to pre-process it.
Extract Transform and Load (ETL) – over the years, there are many approaches to ETL which ranges from manual coding to the use of an enterprise ETL tool.
Some of the examples are:
- Automated source system flat file extracts, loaded and processed via multiple complex SQL statements – typically these implementations are harder to maintain due to scripts that are located in multiple locations
- Use of ETL tools that are bundled with the Data Warehouse RDBMS such as SQL Server Integration Services and Oracle Warehouse Builder – often optimised for the RDBMS which it is bundled with and/or limited in their functionality
- Enterprise ETL tools such as Informatica Power Centre, IBM Data Stage or SAP BusinessObjects Data Services – database independent integration platforms often with added functionalities for scalability, scheduling, data quality and stewardship functionality
Store and Optimise – refers to the corporate repository where all the relevant data is structured, stored and optimised for performance reporting. This has traditionally been a relational database management system such as Oracle or SQL Server. Occasionally this can also be a specialised database for analytics such as Sybase IQ or an appliance such as Netezza or Teradata.
Front End Analytics – in the past there were requirements for formatted and paper based reporting that was mostly historical in nature. In some cases, there were also requirements to provide end users with ad-hoc reporting and some level of KPI based performance dashboards. Typical solutions here: SAP BusinessObjects, Cognos, Microsoft Excel, Qlikview or any one or more of the many others available.
For some customers, the traditional implementation described above will still fulfil their requirements. However, as mentioned in the previous post, the requirements for business users are evolving and data warehousing/BI teams are struggling to implement manageable architectures to address these complex and evolving needs.
Some examples of those evolving requirements are:
- The Marketing Analytics team that is working on a customer churn / prioritisation predictive model. They will need to ingest customer behaviour data from cloud based mobile apps and CRM data from the data warehouse. There is an increasing variety of data sources that resides in the cloud and may not be structured/relational data sources.
- The Maintenance and Support team that wants to leverage on new sensor technologies to get a real-time view of how customers are using their products. Perhaps this will be used to improve service levels or even offer more efficient preventative maintenance. In such instances the volume of data will increase tremendously.
- The Finance team that wants to perform more advanced Planning, Budgeting and Forecasting operations using the data warehouse. Ideally without having to run endless extracts and having to be able to make use of data that are more up to date.
- Senior Management would like to make use of their iPads. This creates a need to deliver real-time KPI information that can be drilled down into details and be done on the fly.
So how would these new requirements impact the traditional implementation architecture? They make it significantly more complex.
Source Data – ability to ingest new types of data sources (structured or unstructured) and can reside on premise or in the cloud as a service.
Extract, Transform and Load – requires different connectivity to new types of data sources or different consumption models (e.g. JSON and REST APIs). Real time requirements will also mean the need to stream data which requires a different way of monitoring and scheduling / management layer.
Store and Optimise – ability to handle “Big Data” requirements. Simplistically put, a robust repository capable of addressing the 3 V’s (Variety, Velocity and Volume). In layman’s terms: unstructured, fast moving and high volume data. This is where we see some customers exploring technologies such as Hadoop. Some common questions from these customers would be how the newer technologies would co-exist with the traditional SQL based data warehouse? And if Hadoop can deliver the performance that is required?
Front End Analytics – there are many instances where business users have already purchased their own departmental solutions. This is happening as business users are demanding for more agility and is tired of waiting for their IT teams to deliver requested functionality. Quite often these solutions duplicate the existing data in order to address “advanced analytics” workloads such as predictive and statistical modelling. This scenario is however not ideal as it creates more silos of data that need to be managed and updated to ensure they remain accurate and consistent.
The evolving requirements and data architecture described above is less than optimal. Buying and adding a multitude of components to address changing requirements results in a number of key issues:
- Integration – how do you get all these new technologies to talk to each other
- Overlap / redundancy – many tools that have similar capabilities
- Risk – who truly understands all of the components
All these create a COMPLEX ARCHITECTURE resulting in INCREASED COSTS and LOST AGILITY.
So what’s the answer?
Is there an alternative to organically stretching your existing data warehouse architecture, bolting on new capabilities as you go? Can we move to a new architecture that will deliver the kind of experience that our users desire?
Which of these collections of properties look like they would be the easiest to manage, maintain, extend and ultimately live in?
What if you can implement a data architecture that was fully integrated including management and monitoring. With the ability to not only monitor the end to end process but also the quality and lineage of the data itself.
A data architecture that could seamlessly ingest, store and optimally respond to both traditional transactional BI and newer Big Data workloads based on batch and real-time data.
What if this architecture was open standards compliant, able to support best in breed BI tooling in order to address advanced use cases such as predictive analytics.
Finally, what if this could be delivered either on premise or in the cloud.
Would you be interested in understanding what this data architecture looked like?
The answer: YES! (Of course…).
The Modern Data Platform
The Modern Data Platform delivers on all the requirements for a next generation data warehouse. Enabling organisations to radically simplify their existing legacy or overly complex solutions in order to lower running costs, improve agility and deliver real business value.
Remember to follow the rest of the posts in this blog series where we will explore in detail the 3 remaining common challenges of traditional data warehousing.