By Craig Colangelo, Sr Consultant for PerformanceG2
As the downstream beneficiary of the company’s operational data, those of us who design and build data warehouses see all sorts of bum data. Operational systems that don’t exactly behave as everyone expects, business results that differ from commonly held assumptions, dirty data that appears to be in some unknown language, and so forth and so on. We, of course, have the means to code around most any obstruction we encounter in the data, but we need to make sure that the owners of the source systems know about potential issues in their applications and processes.
A data warehouse is in a unique position because its design is influenced by design assumptions in the source systems and business processes feeding into it. When these assumptions are proven incorrect, we are able to add huge value to the application development process by calling out the incorrect assumptions so that they might be addressed upstream or consciously chosen to be ignored. Either way, it’s our responsibility to make sure that we delicately communicate the potential issues to those who need to know. This leads to tighter business processes, stronger source systems, and ultimately better data for decision makers.