The Advanced MaxDiff HB test is a great way to compare many alternatives without overwhelming respondents by asking them to read and consider all items at once. It takes a list of your items to be compared, and shows them in a balanced order to each respondent several items at a time.
The method is focused on collecting high-resolution individual-level data to be analyzed by Hierarchical Bayesian model. In typical settings respondents would see 10-20 screens.
aytm determines the number of screens to show to respondents automatically depending on the complexity of the study (i.e. number of items and number of items per task). For small and medium-sized studies, up to 40 items, an exposure frequency of approximately 2.5 times per item can be expected. In studies containing 41 items or more, exposure frequency drops to approximately 1 time per item allowing for more items to be evaluated by a respondent. Note: in no configuration would respondents be exposed to more than 20 screens total. In large studies testing up to 200 items, where the number of items being tested is greater than 20 * # items/screen (i.e., 3, 4, or 5), respondents get shown a random subset of items at 1 per item exposure frequency. For instance, in a 120 item study with 4 alternatives per screen, it can be expected that respondents will see 20 screens containing balanced sets of 80 quadruplets. This ensures good data quality regardless of the size of the study by keeping respondents' burden in check.
In deciding which items to show on the next screen of the MaxDiff, the system focuses on ensuring equal coverage of pairs of items among all respondents. In other words, items are chosen randomly, with bias toward pairs of items that were seen less frequently overall across respondents.
The core analysis of respondents' preferences is performed with help of Hierarchical Bayesian Multinomial Logit model. Bayesian model is estimated with Hybrid Gibbs Sampler with a random Metropolis step MCMC. The number of burn-in iterations is determined automatically when there's enough evidence for convergence.
The model takes into account properties of other items presented in a task when respondent makes a choice. The best/worst probabilities correspond to the logit transformation of the linear combination of utility scores of the packages in the task. Respondents are analyzed individually, with their preference scores being a realization of pooled "average" opinion which follows a Normal distribution, at the same time reflecting their individual preferences. As a result, raw Logit coefficients are available for every respondent.
The statistics page has three display modes:
- Preference Likelihood (X/screen) represents the likelihood that an item would be preferred over (X-1) other randomly selected items in the set. This score is appropriate when the MaxDiff exercise shows X items per exposure.
- Utility Scores are the raw regression coefficients estimated at the aggregate level. They are zero-centered so that 0 represents the average performance. The more positive an item's utility, the more it is preferred by respondents and the more negative an item's utility, the less it is preferred.
- Average-based PL (50% baseline) - represents the Preference likelihood [PL] that an item would be preferred over one other randomly selected item in the set. A score above 50% indicates that an item is a better-than-average performer.
When applying filters to a survey, Advanced MaxDiff HB questions will show aggregate statistics for the current subset of respondents.
- Raw coefficients export: all three reported statistics on individual level (for each respondent)
- Raw data Export: data on what each respondent saw, and what decision was made on each task