First, keep in mind that the details of this scenario are entirely fictitious and exist only to help illuminate the underlying concept.
Let’s assume you are an analyst for a company that litigates consumer debt. You have been asked to identify which accounts will liquidate at the highest rate. After digging into the data, you find that the accounts of consumers employed at Walmart liquidate 20% higher than all other accounts. Great work! Everyone is happy and you have something to brag about.
Then the question: Why? Are Walmart employees more willing to pay prior to litigation? Is it because their employment is already known? Do they have more ability to pay?
Before investigating these questions, take a look at the data. The first thing you notice is that the average age of those Walmart employees is considerably younger than that of everyone else. Again, the question must be asked: Why? You discover that your database of Walmart employees only includes people who have been hired by Walmart in the past 10 years. It does not include anyone hired prior to 10 years ago.
OK, now you are onto something. You think of all the money you saved your company through your diligence to accuracy and press on with your new idea: Do younger people liquidate higher?
You again ask yourself, “why” and after more analysis discover that younger people carry a lower account balance than older people. This makes sense because younger people have lower incomes and less time to have accumulated debt. So you separate the young from the old and for each group you plot liquidation rate versus account balance and discover that the curves look the same. Statistical analysis confirms this as well.
Now where do you go? Account balance is the primary driver of liquidation rates. You can ask why and dig deeper, but the point is made clear by now. Without investigating the “why?” of your results you would have been chasing Walmart employees in vain. Your results would have misrepresented the truth, restricted the number of attractive accounts and cost your company money.
Always remember to ask why and look to your underlying distribution for the answer.