Do I Need a Data Warehouse?

   —   
This article shares insights on the term "Data Warehouse". Further it states the basic use cases for data warehousing and explains why a company should (and also possibly should not) build and use its own Data Warehouse.

It's all about Business Processes

Any business you can think of can be described as a set of sequencing or intertwined processes that happen within your system. Together, the business processes are being executed to achieve the business goal. For example, an e-shop’s business processes can include Orders, Payments, Deliveries and Returns; whereas business processes for a bank can range from withdrawals or deposits all the way through to loans. Acquiring new clients can be considered a business process as well. Business processes are what your business is all about and every system you use is built upon business processes.

Your company captures info about Business Processes

These systems are called operational systems because they "operate" in order to support the business process. They can also be called transactional systems because they specialize in transactions - the tiniest elements that your business processes consists of from the developer’s point of view - in a manner that supports the fast capturing of transactions.

Customer ordered a smartphone? Boom, that's a transaction. Let's capture it. Customer changed her order? Want's a different color? Boom, another transaction, let's capture that change. Wait, customer changes her billing address? You guessed it - transaction. And, what now? The customer canceled their order! Another transaction.

The following example illustrates what can happen in your operational system: We have witnessed the insertion of a new record (order), the update of another (address), and the deletion of a previous (order) in rapid succession. All would be accompanied by a bunch of “select” statements to display the order, the address, the interface itself and so on. Many things are occurring here, but operational systems are built to handle it.

This article contains "Data Warehouse" in the title. Let’s get on with it, shall we?

Very well, let me start with a brief statement of what a data warehouse is: The essence of a data warehouse lies in being specialized in the analysis of business processes. Unlike operational systems, it is not frequent for the data warehouse to be asked to change the data. Its sole purpose is to view and inspect existing data and then draw conclusions with new information, which in turn means quicker generation of reports.

All reports, dashboards, and data mining tasks are based on the same consistent, reliable, credible and business-oriented (understandable) data source, so that users will never be left wondering if the data is correct or what it even means. Also, it determines if the data is accessible only to those who are authorized to see it.

The data warehouse runs separately from the transactional systems, so their limitations do not apply here. This not only makes it easy to access the information, but also much more feasible to develop reports that present the information.

How does a business benefit from its data warehouse?

The answer is pretty simple: As a business analyst, marketer or a C-level officer, your primary goal is to drive your business forward. Let’s say that, so far, your intuition is correct and that your reports have been produced at the zenith of a top-notch operational system. But perhaps there's time for more, and there is only so much the operational system can handle. As I have mentioned (and this cannot be stressed enough), it is primarily optimized for capturing transactions. Reports usually have to aggregate these data sets and sometimes even perform difficult calculations in order to deliver the insight you have been asking for.

Leads to performance impact on the operational system...

Any business that runs without its operational system working is doomed. Reporting directly on top of the operational system can generate a huge load, and as your inquiries on the system grow more sophisticated and eventually begin to include data mining, it can noticeably slow down the operational system and even make your customer feel highly uncomfortable.

The data warehouse runs as a separate system, and because of that, it mends this kind of problem.

I have more than one operational system...

It is not rare that a company has more transactional systems at its disposal, usually specialized in the different aspects of the business. However, you may still want to report on these specializations together and compare them. Let's say that your employee system and your marketing system are completely separate and you would like to see if the top level (employee system) employee's (employee system) campaigns (marketing system) are actually responsible for the huge amount of conversions (marketing system) you saw last month.

Again, the data warehouse comes to the rescue as it contains data from all your transactional systems. Furthermore, it consolidates the data you would have otherwise had lying around in your drive as excel files.

Our last talk with sales and marketing departments resulted in puzzlement...

This one really depends on the reason for confusion, but if it's a matter of different terminology. Imagine that a Customer in terms of sales is a person who bought your product, whereas marketing begs to differ by saying that they see a Customer as a person that has created an account (without needing to have purchased anything). We call these "homonyms".

If homonyms are your problem, then the data warehouse helps you fix this by stating clear, comprehensive and unambiguous terminology.

I wanted to see how well we were performing two years ago, but my Excel data is gone...

This is a common problem with spreadsheets and operational databases: they maintain only the current version of data. Don't get me wrong, keeping only the current version is fine from a performance perspective, and it’s great for the operational system as well. However, imagine you're a digital agency and you use lead scoring. For each potential customer you track their score. You’re fine if someone asks you about the current status of your leads, but suppose your boss needs to see Year-over-Year analysis on your leads, but the current state of your operational system renders it impossible to answer. Why is this happening?! Because you’ve tracked only the current version of the scores and there is no way of telling if the current lead has been considered a lead the previous year as well. There is no way to go back in time in order to inspect what the lead scores were like a year ago.

Unless you have a data warehouse! Then it is simply a matter of seconds, as it is vital for the data warehouse to keep track of the history of the business processes. You can then also ask questions like, “How long does it usually take for a lead to become hot?” The business user may also reap maximum benefit from data mining by posing the question, “Is there any pattern in the business process of leads becoming hot that is shared across leads themselves?” Such simple yet powerful questions truly leverage the strengths of the data warehouse and data mining (Knowledge Discovery) and allows for further lead scoring or any other business process enhancements.

I need the report now, but my devs are currently busy with other business-critical tasks!

And to make matters seem even worse, the meeting that requires this report takes place later this afternoon!

Luckily, a well-designed data warehouse combined with the right business intelligence tools allows for just in time (ad-hoc) reporting that you can leverage on your own.

Data Warehouse Considerations

So, a data warehouse brings us access to reports and insights about business processes. It consolidates data from multiple sources and serves it in an unambiguous manner, allowing the inspection of data with a historic context, all in a timely and trustworthy manner in order to support business decisions. Information provided is always easily accessible for everyone that has the permission to do so. But it comes with a price.

A data warehouse requires a development team, and the team must have a seasoned data warehousing specialist. Keep in mind that it’s a rather complicated area of expertise, far too much so to have a completely junior team working on it, not to mention the responsibility placed in them; it’s a vital business project after all.

Building a data warehouse, its schema, physical models, and the processes to load it can cost you. The aforementioned “cost” manifests itself in the form of server licenses and consulting, but this always takes time. If you decide that your business needs one then you had better arm yourself with patience. It does not only take up your devs' time, it may occupy yours as well... The DW team needs to interview both, the people who will be using the DW to produce reports and analyses, as well as the people who will be in charge of the transactional systems.

You need to decide if it's worth the cost

Ultimately, there is no short answer to the question "Should my company build a DW?" As a data science and data warehousing enthusiast, I would like to always answer "yes, without a doubt!" However, chances are you are doing just fine with your current state of reporting and are satisfied with the way your business is going. In that case you can still benefit from a data warehouse in the same ways that have been briefly described above. But, you must keep in mind that for serious analyses a data warehouse is an invaluable assistant.

Share this article on   LinkedIn Google+

Pavel Janecka

My passion lies in clean code, data science and guitars. I write about Data, SQL Server, Data Warehousing and Big Data opportunities/research.

Comments