The Advanced MaxDiff HB test is a great way to compare many alternatives without overwhelming respondents by asking them to read and consider all items at once. It takes a list of your items to be compared and shows them in a balanced order to each respondent several items at a time.
The method focuses on collecting high-resolution individual-level data to be analyzed by the Hierarchical Bayesian model. In typical settings, respondents would see 10–20 screens.
How It Works
aytm determines the number of screens to show respondents automatically depending on the complexity of the study (i.e., number of items and number of items per task). For small and medium-sized studies (up to 40 items), an exposure frequency of approximately 2.5 times per item can be expected. In studies containing 41 items or more, exposure frequency drops to approximately 1 time per item, allowing more items to be evaluated by a respondent.
⚠️ Note: In no configuration would respondents be exposed to more than 20 screens total.
In large studies testing up to 200 items, where the number of items being tested is greater than 20 × items/screen (i.e., 3, 4, or 5), respondents get shown a random subset of items at 1 per item exposure frequency. For instance, in a 120-item study with 4 alternatives per screen, respondents will see 20 screens containing balanced sets of 80 quadruplets. This ensures good data quality regardless of study size by keeping respondents' burden in check.
In deciding which items to show on the next screen, the system focuses on ensuring equal coverage of pairs of items among all respondents. Items are chosen randomly, with bias toward pairs of items that were seen less frequently overall across respondents.
The core analysis of respondents' preferences is performed with the Hierarchical Bayesian Multinomial Logit model. The Bayesian model is estimated with a Hybrid Gibbs Sampler with a random Metropolis step MCMC. The number of burn-in iterations is determined automatically when there is enough evidence for convergence.
The model takes into account properties of other items presented in a task when a respondent makes a choice. The best/worst probabilities correspond to the logit transformation of the linear combination of utility scores of the packages in the task. Respondents are analyzed individually, with their preference scores being a realization of the pooled "average" opinion which follows a Normal distribution, while at the same time reflecting their individual preferences. As a result, raw Logit coefficients are available for every respondent.
The Statistics Page
The statistics page has three display modes:
- Preference Likelihood (X/screen) — represents the likelihood that an item would be preferred over (X−1) other randomly selected items in the set. This score is appropriate when the MaxDiff exercise shows X items per exposure.
- Utility Scores — the raw regression coefficients estimated at the aggregate level. They are zero-centered so that 0 represents average performance. The more positive an item's utility, the more it is preferred; the more negative, the less it is preferred.
- Average-based PL (50% baseline) — represents the Preference Likelihood that an item would be preferred over one other randomly selected item. A score above 50% indicates a better-than-average performer.
When applying filters to a survey, Advanced MaxDiff HB questions will show aggregate statistics for the current subset of respondents.
Export includes:
- Raw coefficients export — all three reported statistics on an individual level (for each respondent).
- Raw data export — data on what each respondent saw and what decision was made on each task.