Analyze Your MaxDiff Experiment
Preference Likelihood (#/screen)
With Preference Likelihood selected, the baseline is set at appropriate percent depending on the number of items per screen programmed in the MaxDiff. This will represent the chance an item would be selected from a random set of items, but now the set size will match how many items were tested within the MaxDiff tasks respondents completed.
In the example below we showed respondent 5 alternatives per screen, the baseline is the black line set at 20%.
Average-based PL (50% baseline)
|For Average-based PL (50% baseline) the baseline is set at 50%, the probability an item would be chosen from among a set of two no matter how many items per screen respondents interacted with.|
|With Utility Scores visualized, values are shown with a zero-centered average, to show performance in relation to one another. Since zero represents the average performance, the more positive an item's utility, the more it is preferred by respondents, and the more negative an item's utility, the less it is preferred.|
What is the best metric/output to use in my analysis?
There is not a single best metric per se, it often is a matter of personal preference.
- Preference Likelihood scores are more easily interpreted than utility scores because the values have more meaning. With preference likelihood, each percent represents the probability an item would be most preferred out of a given set.
- If you prefer the given set to reflect the task respondents completed, use Preference Likelihood (#/screen).
- If you prefer the given set to reflect a head-to-head comparison of one item versus another, use Average-based PL (50% baseline).
- Utility Scores can provide an easy high-level view of what performs above average (positive value), what performs below average (negative value), and the overall rank order. For significance testing between options, we recommend using Utility Scores.
The rank order of items varies across different metrics. Which metric should I use to report rank order?
The short answer is we recommend using Utility Scores for looking at the overall rank order of items. Without diving too far into the math, Utility Scores are preferred as they are the rawest form of the analysis and the data is normalized.
The baseline value of Preference Likelihood (#/screen) does not match the average of all PL values. Why is this, and what does this mean?
It has to do with the mathematical transformation of the raw utility scores that is being done to produce preference likelihood based on the number of items per screen. The short answer is it’s easier to move upward than downward in these calculations. When there is a clearer rank order and preference among the items, the average of these values will creep above the baseline value which is the theoretical preference likelihood, simply thought of as chance. If all items performed equally, the average of Preference Likelihood scores would more closely align with the baseline value.
Some items that perform below the baseline with Average-based PL (50% baseline) perform above the baseline with Preference Likelihood (#/screen). Why is this, and what does this mean?
As mentioned above, it is possible and likely for Preference Likelihood values based on the number of items per screen in the MaxDiff exercise to creep above the theoretical average and thus each item can move up a little. This isn’t an effect observed with Average-based PL (50% baseline), going back to the mathematical simplicity of this metric. Since these two behave slightly differently with regard to the baseline, it isn’t fair to compare how items perform against the benchmark under different scenarios. For those wanting to understand pure above-average and below-average performers, we recommend looking at the Utility Scores (positive values = above average, negative values = below average, values tightly around 0 are about average).