Decisions

How to use statistical tests in decision making with data

Statistics alone cannot prove anything. Instead we use statistical inference to reject or accept explanations based on their relative likelihood. We compare statistical evidence using a methodology called hypothesis testing.

Testing the validity of assumptions and hypotheses and identifying outliers or anomalies in the data are important steps in the process of making informed decisions. One way to do this is by using statistical methods. These methods allow individuals or teams to analyze data and make inferences about a population based on a sample of data.

Hypothesis testing starts a null hypothesis. It’s a starting point, usually conservative, and often reflects the status quo. Then we propose an alternative hypothesis. If subsequent statistical analysis means we can reject the null hypothesis, it logically follows that we accept the alternative hypothesis. If there is not enough statistical evidence, the null hypothesis stands.

Let’s look at an example: A/B testing.

A/B testing is a commonly used method in marketing and product development for digital products such as websites and apps. In an A/B test, user responses to two design variations (A and B) are tested to determine which is better. A standard null hypothesis is that both versions have the same effectiveness. The alternative hypothesis would be that design B, which is a new and experimental design, is better than design A.

Better, in this case, means some metric of effectiveness. It may be clicks, conversions, or usage rate, for example. The test is to calculate the probability of observing the click rate of design B assuming that the null hypothesis is true.

Humana is a healthcare insurance provider. Their landing page’s banner displayed a lot of information with weak calls-to-action and no clear messaging (version A). The banner is valuable real estate so they decided to test it with simpler information. Version B received more than 400% more click throughs than A.

If you were looking at these results the question you should ask is this: how high or low does the numerical difference have to be in this experiment before rejecting or accepting the null hypothesis?

This is one of the most important questions in all of statistics! In this case, you want to know the p-value. This important value is the probability of observing something extreme in the data if the null hypothesis is true. The smaller the p-value, the more likely the evidence favors the alternative hypothesis.

The significance level is a probability. It is a defined threshold of the upper bound for the likelihood of observing some pattern of data if the null hypothesis were true. A common significance level is 5 percent (or 0.05). If the p-value is above 0.05, you can’t reject the null hypothesis. Below this threshold, you can reject the null hypothesis and accept the alternative.

To test the validity of assumptions and hypotheses, individuals or teams can use various statistical methods which are used to determine whether there is a statistically significant difference between two or more groups of data, or between the sample data and a population parameter. For example, a t-test can be used to determine whether there is a statistically significant difference in the mean of two groups of data, while an Analysis of Variance (ANOVA) test can be used to determine whether there is a statistically significant difference in the means of more than two groups of data.

Another important step in data analysis is identifying outliers or anomalies in the data. Outliers are data points that are significantly different from the other data points in the sample. They can be caused by measurement errors, data entry errors, or other factors. These outliers can have a significant impact on the results of the analysis and can lead to inaccurate conclusions. To identify outliers, individuals or teams can use statistical methods such as box plots and scatter plots. These methods allow individuals or teams to visualize the data and identify any data points that are significantly different from the other data points.