WHAT IS A DATA WAREHOUSE?

To get to the meat of what a Data Warehouse is, there usually are a few concepts that have to be better understood.
WHAT IT’S NOT
A Data Warehouse is not something you will be buying off the shelf of your local cloud vendor or software provider. While there are specialized hosting solutions for Data Warehouses, don’t confuse that with the Data Warehouse itself. A Data Warehouse is a solution. Every solution stems from a problem that needs to be fixed. So what is the problem that Data Warehousing is fixing? Let’s reveal that by going through a case study then piecing it apart.

PROBLEM STATEMENT CASE STUDY
A national grocer brand had built a significant market dominance in their respective locations and was seeking to double in size. The organization was already a substantial size, and one of the glaring challenges they were dealing with was the difficulty of making decisions that impacted the operation of the business. To make these decisions the organization had a horde of data analysts. These analysts worked to gather data from various sources and roll it up for their direct reports to interpret. These silos-of-analysis would ultimately roll up to executive management which were used as a basis for their decision making meetings. The challenge to executive management was that the many sources of analysis made it difficult to determine which version of the “truth” was to be believed. Each source of data seemed to give a different picture of reality. This disparity was particularly stressful when “make or break” decisions were on the table. While these challenges plagued the executive management meetings, the disparity of truth also impacted the process owners. The merchant teams were particularly impaired by this. These merchants were on the front line with their suppliers, negotiating unit pricing, and delivery methods. Their ability to successfully execute their decisions had multi-million dollar repercussions to the grocer’s bottom line. However the merchants were constantly playing catch up trying to prepare analysis for negotiations with their suppliers and determining the right logistics for their products. For them to cobble together analysis, they had to manually reach into various application systems which were the source of data. These were applications like their Point of Sale (POS) system, Accounting systems, Manufacturing systems, Logistics Tracking systems, Human Resource systems, Web Traffic systems, etc. The structure of the data coming out of these systems was clearly not designed to be business friendly, but rather the structure of the data was suited to service the internal functions of the application. Additionally the actual extraction of the data from these source systems was very difficult to obtain as they couldn’t interrupt the applications core function. The grocer began creating copies of application systems which users could access as to not disrupt the users of the application. The number of copies and combinations of copies began to count in the hundreds.
The grocer needed to get out of the data collection business and into the grocery business. In other words, their day-to-day was consisting of data manipulation and munging and not progressing their core business. Ultimately this became a clear barrier in keeping the business from obtaining its goal of doubling in size.

PIECING APART THE PROBLEM
So lets break down the different parts of the problem into smaller chunks that can be identified.
Many Data Sources – The sources of the data are the application systems that are being used by the organization to enter orders, track purchases, etc. The many sources of data being accessed by the hundreds of employees meant that there were thousands of connections that had to be individually maintained.
Slow queries – The structure of how Application data is stored is NOT designed for batch retrieval. Rather it’s designed to be efficient at reading a single record, writing a single record, reading a single record, writing a single record, etc. So when you ask that structure (or Schema) to cough up one million records, you might as well go take the rest of the day off while the query runs.
Complex Logic – Application data is NOT designed for human consumption, but rather it is used to pull the levers running the internal cogs of the application. Yes it does contain the data the business wants, but only after careful assembly of logic can this be revealed. This means that even “simple” queries require a massive effort to piece together.
Decentralized Logic – The logic for each analysis lived in people’s heads or was nested in individual analytics. This meant that little sharing occurred between analysts, discoveries or corrections of logic were rarely leveraged across the organization.
Many Versions of the Truth – The natural outcome of having Decentralized Logic, is too many versions of the truth. Executive meetings on important decisions became arguments on who’s version of the truth was correct.

A DATA WAREHOUSE IS…
A Data Warehouse is the culmination of a solution to solve these 5 problems, but let’s get into what it physically is.

IT IS A DATABASE
Yes, the Data Warehouse is a database. Before you brush it off saying, “I already have a database,” let’s differentiate. A generic database like Oracle or SQL Server can become whatever you want it to be. You can either use it to process and store singular transactions (OLTP) or you can have it be a Data Warehouse for corporate analytics. What differentiates one from another is it’s schema.
An OLTP schema typically positions the tables and their relationships in a very linear path to a singular key. This makes processing individual transactions predictable and auditable.
A Data Warehouse schema on the other hand repositions the tables to optimize the relationship between attributes and measurements. This is done by centralizing all the measures in a center table and all the attributes in surrounding tables. This not only makes the SQL logic for acquiring data much easier to write but it also increases the query execution speed dramatically.
There are database platforms that further optimize this query speed and are built for high output on-demand like Snowflake. Coupling the engineering of a Data Warehouse on top of these specialized platforms makes for a very efficient, high-speed information delivery solution. But how does this equate to solving our 5 problems mentioned above? Well, remember that I was careful to say that it’s the “culmination of a solution to solve these 5 problems”. To get to the point where you have this structure in place, there’s work to do. And we haven’t even scratched the surface yet. So think of the Data Warehouse as one of the last destinations just before it gets consumed by the masses of dashboards, analytics, reports, and data exports. Getting to that point requires quite a bit of build up.

HOW CAN WE SOLVE THIS PROBLEM?
Telligens, among others, can provide answers in the area of Data Management, with its office located in California, USA. Our team of experts has implemented in a variety of Industries including, Healthcare, Insurance, Manufacturing, Financial Services, Media, Pharmaceutical, Retail, and others.
If you need more assistance, contact us at: letstalk@telligens.com

Similar Posts