In Data Don’t Trust

Mar 20, 2019–I’ve come to believe one thing: Don’t believe anything.

One recent chilly morning I was trying to determine just how chilly it was so I would have some ammunition to start my breakfast conversation. I happened to be driving past a Time & Temperature sign, and noted it was right at 32 degrees. Perfect!

Then I glanced at my dash temperature, and it read 36 degrees. Curious for consensus, I punched up my phone, which told me it was still 28 degrees.

Here I hovered at a singularity in the space-time continuum, and my three digital companions disagreed on something as basic as the current temperature. And not by a degree or two. There was a full 8-degree discrepancy, alarming when we are warned of the calamities connected to changes of a fraction of a degree. Interestingly, the time displayed on each device was different by minutes, as well.

This thought percolated as I was preparing a presentation on marketing. Tech insiders had just admitted something that all of us who work in social media suspected: analytics are false. All of them–number of likes, engagement, visits, followers, views, etc.–are imaginary, as ephemeral and capricious as dandelion tufts on the southern breeze.

This could be categorized as “mildly interesting,” but for the amount of time, energy, and money we spend chasing those magic numbers.

I first suspected the reliability of “data” back in the 1980s when I was the one collecting it.

I was working at a nonprofit-that-shall-not-be-named. As our funds came from government agencies, we were required to provide detailed data on all participants. Every spring a pound of paper (I actually weighed it) would fall on my desk, with reams of forms to fill out.

The one category I–and my professional peers–always struggled with was the “Reason for Separation.” In this section we were required to document what caused a student to leave our program.

This practice seemed futile. Our clients often included the transient and low income, many without permanent addresses or phones. How, I asked a denizen of the bureaucracy, am I supposed to know why someone stopped coming to class, when they were no longer there and there was no way to contact them? We could not even be sure whether they had stopped attending, or were taking an extended break.

“Make your best guess,” was his helpful answer.

As I recall, the choices for leaving covered No Transportation, Lack of Childcare, Relocation, Change in Employment, and other general terms. We had to pick only one. So, we guessed.

I faithfully filled out the form and shipped it back to the hive.

Every fall, we received a beautiful report filled with statistics describing the state of our industry. That’s when I first came to the realization that all such reports should be taken with a truckload of salt. Assuming my peers had filled out the form as I was instructed–by guessing–multiplying my reports by hundreds of other programs, some of the “data” could have been off by factors of 10. Yet all that data was the basis for decisions on funding and policy for following years.

Hopefully the statute of limitations has expired. Yet we did nothing illegal or unethical, we truly gave it our best effort with the information we had. But the raw numbers just could not reflect reality.

Yet the struggle to collect meaningful data might not be futile. Even accepting the premise that some numbers are wrong, you can assume that the numbers are equally wrong around a midpoint. That means the conclusions drawn from them still have merit, if only in terms of trends and patterns compared to equally faulty numbers from previous reports.

Don’t believe me?

Good. You shouldn’t.