Data Is Dirty – Get Over It

Much concern and attention  is spent on data cleansing and some of it is justified. If someone’s email address is wrong in the database then clearly attempts to communicate will fail which is a waste of time and money, and a loss of opportunity for the recipient.

But I would contend that only factually incorrect data should ever be considered as requiring to be cleaned.

All other “data imperfections” result from the realities of life.

We have recently been dealing with data where transfers of responsibility from one supplier to another – which should be transparent to the customer – clearly have implications for sales that result from what can be broadly described as “disruption”.

But is this a data failing or something to be modelled as part of a more general system?

The answer depends on how you are going to use the model.

If the purpose of the analysis is to understand some basic dynamics in past data such as the impact of a recent marketing programme, then removing such “noise” can be considered to be part of the “data cleaning” needed to ensure those measurements are sound.

If the purpose is to provide a model data that can run in a repeatable real time analytics environment, however, then we need to consider the event as something that needs to be captured and modelled as part of a more generalised process.

Knowing which you need to use when is a key part of providing an effective solution in supporting client decision making.

If you would like to find out more about real time analytics and how to do it then feel free to drop us a line.

Schezzer