Decisions

How to determine causation from correlation

Correlation and causation are two closely related concepts in statistics, but they are not the same. Correlation refers to the relationship between two variables, where the values of one variable change in relation to the values of the other variable. Causation, on the other hand, refers to a relationship where a change in one variable causes a change in another variable. Understanding the difference between correlation and causation is important because it can help in making better decisions, designing experiments, and drawing accurate conclusions.

There are several ways to distinguish between correlation and causation:

  1. Temporal Order: The first step in determining whether a relationship is a correlation or a causation is to examine the order of events. If one variable occurs before the other, it may suggest that the first variable is causing the second.
  1. Control for Confounding Variables: Confounding variables can cause a spurious relationship between two variables. By controlling for these confounding variables, it can help to determine whether a relationship is a correlation or a causation.
  1. Experimentation: One of the most definitive ways to determine causality is to conduct a controlled experiment. In an experiment, the researcher manipulates one variable and observes the effect on another variable. If a change in the independent variable leads to a change in the dependent variable, then it is likely that a causal relationship exists.
  1. Observing Changes Over Time: Observing changes in the relationship between two variables over time can also help to determine causality. If the relationship between two variables remains consistent over time, it is more likely to be a causal relationship.
  1. Mechanism of Action: The understanding of the underlying mechanism of the relationship between two variables can also provide evidence for causality. If there is a clear explanation for why a change in one variable causes a change in another variable, it is more likely to be a causal relationship.

You also want to avoid taking action on the basis of a spurious relationship. A spurious correlation is a relationship between two variables that appears to be causal, but is actually the result of a third variable. Here is a classic example of a spurious correlation:

Ice cream sales and crime rates: One might observe a positive correlation between ice cream sales and crime rates in a given city. That is, as ice cream sales go up, crime rates go up as well. At first glance, this might suggest that ice cream consumption causes an increase in crime. However, upon further investigation, it becomes apparent that the relationship between ice cream sales and crime rates is actually spurious. The confounding variable, in this case, is temperature. As the temperature goes up, both ice cream sales and crime rates tend to increase, creating a spurious correlation between the two variables.

This example demonstrates how important it is to consider potential confounding variables when interpreting correlations. Without controlling for temperature, the relationship between ice cream sales and crime rates would remain a mystery. However, by controlling for temperature, it becomes clear that the relationship is spurious, and not a causal one.

Correlation and causation are two important concepts in statistics, and it is important to understand the difference between them. By following the steps mentioned above, it is possible to determine whether a relationship is a correlation or a causation.