Analyzing an Advanced MaxDiff

Analyze Your MaxDiff Experiment

On the Results page, click on the Preference Likelihood drop-down to toggle between Preference likelihood (#/screen), Average-based PL (% baseline), or Utility Scores.
Hover over the bars to see further statistical analysis.
Click the hamburger menu to download a PNG, JPEG, PDF, or SVG vector image of the current data visualization.

Preference Likelihood (#/screen)

With Preference Likelihood selected, the baseline is set at appropriate percent depending on the number of items per screen programmed in the MaxDiff. This will represent the chance an item would be selected from a random set of items, but now the set size will match how many items were tested within the MaxDiff tasks respondents completed.

33% if (3/screen)
25% if (4/screen)
20% if (5/screen)

In the example below we showed respondent 5 alternatives per screen, the baseline is the black line set at 20%.

Average-based PL (50% baseline)

For Average-based PL (50% baseline) the baseline is set at 50%, the probability an item would be chosen from among a set of two no matter how many items per screen respondents interacted with.

Utility Scores

With Utility Scores visualized, values are shown with a zero-centered average, to show performance in relation to one another. Since zero represents the average performance, the more positive an item's utility, the more it is preferred by respondents, and the more negative an item's utility, the less it is preferred.

MaxDiff FAQs

What is the best metric/output to use in my analysis?

There is not a single best metric per se, it often is a matter of personal preference.

Preference Likelihood scores are more easily interpreted than utility scores because the values have more meaning. With preference likelihood, each percent represents the probability an item would be most preferred out of a given set.
- If you prefer the given set to reflect the task respondents completed, use Preference Likelihood (#/screen).
- If you prefer the given set to reflect a head-to-head comparison of one item versus another, use Average-based PL (50% baseline).
Utility Scores give a quick, high-level view of performance:
- Positive values = above average
- Negative values = below average
- They also provide the overall rank order of items.
For significance testing between options, use Utility Scores with a t-test, rather than confidence intervals.
- Why not confidence intervals? Confidence intervals show the range where the true population parameter is likely to fall. Shading in the chart, however, indicates pairwise statistical significance—which groups differ significantly from each other. These two measures don’t always align because they convey different information.
  Rule of thumb: If two 95% confidence intervals just touch or slightly overlap, the difference is often still significant at the 5% level.
  - Confidence intervals = one-sample population estimates
  - Shading = tests whether the difference between two estimates is not zero

The rank order of items varies across different metrics. Which metric should I use to report rank order?

The short answer is we recommend using Utility Scores for looking at the overall rank order of items. Without diving too far into the math, Utility Scores are preferred as they are the rawest form of the analysis and the data is normalized.

The baseline value of Preference Likelihood (#/screen) does not match the average of all PL values. Why is this, and what does this mean?

It has to do with the mathematical transformation of the raw utility scores that is being done to produce preference likelihood based on the number of items per screen. The short answer is it’s easier to move upward than downward in these calculations. When there is a clearer rank order and preference among the items, the average of these values will creep above the baseline value which is the theoretical preference likelihood, simply thought of as chance. If all items performed equally, the average of Preference Likelihood scores would more closely align with the baseline value.

Some items that perform below the baseline with Average-based PL (50% baseline) perform above the baseline with Preference Likelihood (#/screen). Why is this, and what does this mean?

As mentioned above, it is possible and likely for Preference Likelihood values based on the number of items per screen in the MaxDiff exercise to creep above the theoretical average and thus each item can move up a little. This isn’t an effect observed with Average-based PL (50% baseline), going back to the mathematical simplicity of this metric. Since these two behave slightly differently with regard to the baseline, it isn’t fair to compare how items perform against the benchmark under different scenarios. For those wanting to understand pure above-average and below-average performers, we recommend looking at the Utility Scores (positive values = above average, negative values = below average, values tightly around 0 are about average).

Search

What can we help you with?

Analyze Your MaxDiff Experiment

Preference Likelihood (#/screen)

Average-based PL (50% baseline)

Utility Scores

MaxDiff FAQs