AB- and Multivariate Tests
Do you have different layouts, placement of text, color, or maybe greater workflows, designs, etc. you want to compare, but do not know how? Read on...
Often it is great to be able to compare two or more versions of your design or prototype.
When you have two versions, we recommend conducting an AB-test, when you want to test more than two versions you should go with a multivariate test instead.
However, there are different things to consider before conducting the AB- or multivariate test:
– Define which variables you are testing in your design – the difference between A and B (C, D, E, etc.)
– Identify which metrics you want to focus on – is it performance metrics (which ones?) or self-reported metrics (which ones?)
– Consider if you need a within- or between-subject design for your test
– For proper statistical analysis and -significance, you need to have at least 15 participants per version
How to set up an AB- and multivariate tests in Preely
In Preely you’ll set up these tests as you set up a regular test (see Create test – step by step and Test: Usability test). You will need to set up tests for each version. How you run the tests depends on the test design being within- or between-subject design.
For the analysis, you should focus on the metrics you defined before conducting the test and compare them to each other.
When working with statistics in this space, we work with two different kinds:
– Descriptive statistics, which is a way to summarize the dependent variable for the different conditions.
– Inferential statistics, which tells us about the likelihood that any differences between our experimental groups are “real” and not just random fluctuations due to chance.
For most, it’ll be enough to use descriptive statistics.
How to perform descriptive statistics
The most common way to describe the differences between experimental groups is by describing the mean scores on the dependent variable for each group of participants. This is a simple way of conveying the effects of the independent variable on the dependent variable.
Other tools that can be used are e.g.:
– Bar or pie diagrams
Want more advanced statistics?
How to perform inferential statistics
Even though the mean scores of the experimental groups showed a difference, this can be due to chance. So the question is: Is the difference big enough so we can out rule chance and assume the independent variable had an effect? Inferential statistics gives us the probability that the difference between the groups is due to chance. If we can rule out the chance explanation, then we conclude that the difference was due to the experimental manipulation.
Two variables (AB test): If you have two variables you should use a Student’s t-test. Here we get the value t and we identify the probability (p) that the t-value was found by chance for that particular set of data if there was no effect or difference. You can find different tools on the Internet and Excel can also be used, to calculate both the t– and the p-value.
More than two variables (multivariate test): If you have more than two variables you should use an ANOVA. Here we get the value F and we identify the probability (p) that the F-value was found by chance for that particular set of data if there was no effect or difference. You can find different tools on the Internet and Excel can also be used to calculate both the F– and the p-value.
The smaller the p probability is, the more significant our result becomes and the more confident we are that our independent variable really did cause the difference. The p-value will be smaller as the difference between the means is greater, as the variability between our observations within a condition (standard deviation) is less, and as the sample size of the experiment increases (more participants or more measurements per participant). A greater sample size gives our experiment greater statistical power to find significant differences.
Within your organization, you need to decide when the p-value is significant. Often we operate with a p-value that has to be less than 0.05 to be significant. Then, if the p-value is less than 0.05 we conclude that the results are not due to change, but an effect of the independent variable.
This might sound very complicated in writing but start by testing two variables and let it unfold from there.
Faulkner, L. 2003. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments, and Computers. p. 379-383. Psychonomic Society, Inc.
Rubin, R. and Chrisnell, D. 2008. Handbook of Usability Testing – How to plan, design, and conduct effective tests (second edition). Wiley Publishing.
Wickens, C. D.; Lee, J.; Liu, Y.; Becker, S. G. 2004 An Introduction to Human Factors Engineering (second edition). Pearson Prentice Hall.