Making sense of popular statistical terms

Guest blog from Iridium Insights, specialists in data integration, analytics and insights for the consumer goods industry

At Iridium Insights we are all about making sense out of data, and as we recognise that a lot of the technical stats terms can be confusing, we’ve created this quick look-up glossary of statistical definitions.

Bayesian modelling

There are two different statistical approaches to gaining insights from data: frequentist (or classical) and Bayesian. The frequentist approach builds a model based only on the data observed, while the Bayesian approach allows some subjective beliefs about the model to be incorporated with the observations.

CHAID analysis (Chi squared automatic interaction detector)

CHAID is a type of a decision tree algorithm that determines relationships between the variable of interest (for example, the number of purchases of a particular product) and the independent variables (for example, customer characteristics – age, gender and socioeconomic status). CHAID automatically creates the decision tree based on the trends and patterns within the data. It can then help understand a customer’s response to a marketing campaign and is often used for customer segmentation.

Cluster analysis

Cluster analysis is an exploratory data analysis method that helps identify meaningful structures within data. It defines areas/groups/segments of data that share similarities across several measures. In the marketing industry, the cluster analysis is often used to identify customer segments.

CHAID is also often used for customer segmentation, but is a very different algorithm to Cluster analysis. Cluster analysis treats all the variables in the data uniformly, while CHAID analysis recognises the variable of interest and independent variables as separate variables.

Correlation analysis

Correlation analysis studies relationships between a variable of interest and an explanatory variable. For example, a variable of interest could be premium juice consumption while an explanatory variable could be GDP per capita. If the relationship proves to be statistically significant, the explanatory variable is said to be related or associated to the variable of interest. Parameters such as r-squared and p-value are used to assess the strength of the relationship.

Decision tree analysis

Decision analysis is a general name given to techniques that analyse every possible outcome of a decision. A decision tree is a graph that visualises the outcomes and can be easily interpreted. They can help understand and evaluate risks and uncertainties. They also can help answer questions such as: What are the factors that affect the sales of a product the most? Can we predict a consumer group response to a marketing campaign?

Machine learning

Machine learning is a method of data analysis that iteratively “learns” from data as it arrives without human intervention. Machine learning can analyse large amounts of data quickly to enable businesses to make decisions about their marketing campaigns in real time and to deliver insights on to complex consumer behaviours.

Marketing mix modelling (MM modelling)

Marketing mix modelling is a method of data analysis used to quantify the impact of marketing activities on product sales. In simplest terms, MM modelling gives weights to different factors that affect product sales. The weights can be determined using for example multivariable regression modelling.

Multivariate regression

Multivariate regression analysis studies the relationship between several variables of interest against several explanatory variables. For example, the variables of interest could be consumption of beer, cider and wine, while the explanatory variables could be the GDP per capita, commodity prices, new product launches, population demographics and so on. Multivariate regression analysis helps to understand how differently the changes in explanatory variables affect the variables of interest.

Prediction interval/confidence interval

A confidence interval is a range of values that is likely to contain an unknown value of a variable. Prediction interval is a type of confidence interval that can be used for values that are yet to be observed.

For example, let the local train delay in minutes represent a variable of interest. If we know from experience that the train is never on time, arriving either late or too early by 15 minutes 95% of the time – then we would say that we are 95% confident that the train arrives at the station during the period between 15 minutes before departure time and 15 minutes after the departure time.

(Multivariable) Regression analysis

Regression analysis is a more general form of Correlation analysis, where the relationships between one variable of interest and several explanatory variables are measured. For example, the variable of interest could be a premium beer consumption while the explanatory variables could be GDP per capita, commodity prices, new product launches and so on. Regression analysis helps to understand how changes in explanatory variables affect the variable of interest. It is widely used for predictions and forecasts.

If you are interested in what Iridium Insights could do with your data, please get in contact: info@iridium-insights.com