What is our obligation as a data scientist in presenting information?

I just spent three great days at a conference. I hadn’t been to one recently, and a lot of times that I’ve gone I’ve been disappointed by the heavy vendor focus on selling products and giving away trinkets. This particular conference was much more intimate, and well worth my time to pick other people’s brains.

In one session about fifteen of us discussed designing effective dashboards. Although I love Edward Tufte, the conversation here never once touched on data to ink ratio or any of his other great ideas. Instead, we spent a large portion of the roundtable le debating the obligation of a data scientist to guide the dashboard design process.

For example, not that long ago Stephen Few held a dashboard design competition ( here’s his result). The challenge I had with this competition wasn’t in how well the data was presented. Indeed I learned a great lot by looking at the winning solution and Stephen’s solution. What left me feeling unsatisfied was the missing opportunity to discuss what should be presented versus how it should be presented.

And this, to me, is the central question of information display. There are many valuable rules about presenting information once you decide to present it but scant little advice on how to decide whether to present it at all. Statistics literacy is weak in many organizations, so as a likely stats expert in your organization, what do you owe them?

  • Help them to identify the outcomes that matter first. Absent this, no dashboard can be useful. It will just be a lot of beautifully presented garbage.
  • Help them determine potential leading indicators and help them assess whether they matter. It’s not enough to have good ideas about what might matter. We have an obligation to test the relationships.
  • Help them think about the sometimes subtle and insidious problems of statistics… False causation, mathematical quirks that change apparently relationships (like log transforms on a single axis or sharing a variable between two composite measures), and other things that will mislead.

If we fail to do these things for our organizations then we do a disservice to our science and to those that we work with. Dashboards should not be created simply to provide confirming evidence for the world view we want to hold, but to help us seek out information that disconfirms our beliefs also.

Leave a Reply

Your email address will not be published. Required fields are marked *