On Correlation and Causation

Study and careful inference is a bedrock of scientific progress - and so one has to be careful with the concepts of correlation and causation whenever reading about the latest science. We humans are built to see causation everywhere, to pick out patterns from nothing with our hyperactive pattern detectors - even when it doesn't in fact exist. The culture and methods of science exist to rein in the excess, to pick out the flotsam of right answers from the vast sea of wrong answers.

The scientific method is the cure for problems caused by magical thinking, such as a lack of progress towards better lives, and all the limitations - dramatic or trivial - that stem from an incorrect understanding of the way in which the world works. To make progress happen, you must tackle complex systems in a methodical way: propose, explore, test, verify, record, repeat. But that requires more work than merely guessing, and so there will always be some market for those willing to take the "shortcut" to the wrong answer. When the wrong answer doesn't have clear, obvious and rapid bad consequences attatched to it, magical thinking will prosper. Such is the downside of human enonomic preferences - there is always a market for "incorrect" when "incorrect" is sold more cheaply than "correct."

But back to correlation; I bumped into a rather good post on the topic that exercises a number of the same frustrations I note when reading about science in the popular press. No-one is as careful as they should be, sadly, and so much of what passes for information is in fact misinformation. Correlation does not imply causation, and noted correlation is often not in fact correlation - and much of the supposed causation isn't much more than wishful thinking either. In any case, the good post is at Good Math, Bad Math:

Correlation is actually a remarkably simple concept, which makes it all the more frustrating to see the nonsense constantly spewed in talking about it. Correlation is a linear relationship between two random variables.


One thing you'll constantly hear in discussions is "correlation does not imply causation". Causation isn't really a mathematical notion - and that's the root of that confusion. Correlation means that as one value changes, another variable changes in the same way. Causation means that when one value changes, it causes the other to change. There is a very big difference between causation and correlation. To give a rather commonly abused example: the majority of children with autism are diagnosed between the ages of 18 months and three years old. That's also the same period of time when children receive a large number of immunizations. So people see the correlation between receiving immunizations and the diagnosis of autism, and assume that that means that the immunizations cause autism. But in fact, there is no causal linkage. The causal factor in both cases is age: there is a particular age when a child's intellectual development reaches a stage when autism becomes obvious; and there is a particular age when certain vaccinations are traditionally given. It just happens that they're roughly the same age.


To show causation, you need to show a mechanism for the cause, and demonstrate that mechanism experimentally. So when someone shows you a correlation, what you should do is look for a plausible causal mechanism, and see if there's any experimental data to support it. Without a demonstrable causal mechanism, you can't be sure that there's a causal relationship - it's just a correlation.

How do you know that reported science is relevant and useful to healthy life extension and the advance of medicine? It certainly shouldn't be because someone is telling you as much, directly and outright. Always look a little deeper; take a little time to explore the underlying facts and ideas in any new scientific news for yourself to see if they make sense.

Technorati tags: ,


There's a wonderful book (pretty heavy math, though) that describes a mathematics for causation. It's called "Causality", and it's by Judea Pearl. The most important contribution of the book is a graphical notation that you can use to carefully analyze data and see when one thing clearly does not cause another.

Posted by: Chris Hibbert at February 6th, 2007 5:19 PM
Comment Submission

Post a comment; thoughtful, considered opinions are valued. New comments can be edited for a few minutes following submission. Comments incorporating ad hominem attacks, advertising, and other forms of inappropriate behavior are likely to be deleted.

Note that there is a comment feed for those who like to keep up with conversations.