Cargo Cult Data
Marton Trencseni - Mon 26 January 2015 - Data
Cargo cult science
R. P. Feynman was a Nobel-prize winning physicist who coined the term cargo cult science. In Feynman's words:
In the South Seas there is a cargo cult of people. During the [second world] war they saw airplanes land with lots of good materials, and they want the same thing to happen now [after the Americans left]. So they've arranged to imitate things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he's the controller—and they wait for the airplanes to land. They're doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn't work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they're missing something essential, because the planes don't land.
Feynman cautioned that to avoid becoming cargo cult scientists, researchers must avoid fooling themselves, be willing to question and doubt their own theories and their own results, and investigate possible flaws in a theory or an experiment. He recommended that researchers adopt an unusually high level of honesty which is rarely encountered in everyday life, and gave examples from advertising, politics, and behavioral psychology to illustrate the everyday dishonesty which should be unacceptable in science.
Cargo cult data
The same idea applies to data. Cargo cult data is when you're collecting and looking at data when making decisions, but you're only following the forms of scientific investigation and missing the essentials, so it doesn't work. So in the end you're like the natives of the South Seas, and the planes don't land for you either.
Signs that you're doing cargo cult data:
- you don't have standardized logging across your products
- you routinely break your logging and have holes in your dataset
- you don't have standardized KPIs across your products and company
- you're not A/B testing all your releases
- you don't have explicit hypothesis for your experiments
- you don't know what statistical power is
- you confuse statistical significance and magnitude of change
- you stop A/B tests as soon as they're statistically significant (=peeking)
- you're not tracking your experiments and their outcomes historically
- you don't know display and think about standard deviation and standard error on diagrams (=confuse signal and noise)
There are no easy answers how to avoid cargo cult data, just as there are no easy answers how to avoid cargo cult science. If you are thinking about this as a company, your best bet is to hire smart mathematicians or physicist for your data team and listen to what they say. Personally, it's a matter of understanding statistics and being disciplined in your work. Fortunately there are great courses on Coursera, great books on Amazon and a wealth of information available online.