4  Background

4.1 Main research question

The overarching question we want to answer is:

Does the choice of specialization affect your M+ score? And in the case it does, how much does it affect it?

Generally, people underestimate the importance of effect size. When you increase \(n\) enough, you can squeeze out significance if it is present. But if you don’t take effect size into account, you don’t know if that significance has a small or large effect. The variance of individual player skill might be dwarfing the effect you get from specialization choice.

Part one of the question is significance. You might have specs which are significantly stronger than other specs, for a particular role. Part two concerns itself with the effect size. You may have significance, but if the change in score is small (say less than 10 points), it shouldn’t guide your choice of specialization. A slightly jokingly way of explaining a low effect size is that if you play a significantly better spec, but sneeze during a dungeon run, then the better performing spec becomes equal to the worse options.

Secondarily, we want to answer a different question:

How open is the balance window? Wide open, or almost closed?

People often say there will always be a spec which outperforms everything else. This statement fails to account for effect size. Yes, a spec might be better, but if it’s close to the noise floor of skill variance, it doesn’t matter. The lower you go in difficulty levels, the less the balance window will matter in the grand scheme of tings, because player skill becomes a deciding factor for most things that’s going on.

The balance window is the distance from the worst to the best spec in predicted power. If this window is wide open, the game is pretty unbalanced. If it is almost closed, the balance is good.

4.1.1 Data collection

To answer the research question, we’ll need some data. raider.io (RIO) provides an API for querying M+ runs, so this will be our so-called sampling frame from which we draw data. There may be more runs out there than what RIOs database contains, but we’ll restrict ourselves to that data for now.

4.2 Tools used & Setup

This site uses a number of tools in order to make data exploration easy. Data collection is done by means of zugzug, a small program written in Go1. It is used to fetch data from raider.io and is also doing some initial pre-processing of said data. The most important job of zugzug is to flatten the data into tabular formats, stored as CSV files2. This makes the data easily digestible later in the analysis-chain.

1 Go isn’t a particularly interesting language, but what it gets right is the ecosystem around programming. The developer experience of Go is far better than what you see in a lot of other languages which sort of had to adapt ecosystems and developer experience by accident.

2 At some point, this might change. CSV is an abysmally bad data interchange format between systems because it lacks a lot of structure. But so is JSON, used as the lingua-franca of data interchange. An obvious choice here would be either based on Protobufs or Parquet. An alternative is to just dump the data into a local SQlite database and use that for reading in the data

3 It’s only “almost” because in true literate programming, the order in which code blocks occur in the text is independent on how they are assembled into a final program. Futhermore, code blocks can embed other code blocks, such that the program can be split up into small sections which are easy to explain. None of these features are present in Jupyter notebooks or Quarto, which is a shame.

The main processing is done in R, including this website. We use knitr and Quarto for producing the website. This is “almost literate programming3”, where chunks of computation is embedded within text.

The following code chunk loads the libraries we use within R:

Code
#install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/"))
library(cmdstanr)
#cmdstanr::install_cmdstan()
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
library(posterior)
library(bayesplot)
color_scheme_set("teal")

library(MASS)
library(tidyverse)

library(broom)
library(GGally)
library(ggstatsplot)
library(ggpubr)

library(easystats)
library(emmeans)
library(tidybayes)

library(lme4)
library(sjPlot)
library(PMCMRplus)
library(bayesplot)
library(performance)
library(boot)
library("Ckmeans.1d.dp")

library(FactoMineR)
library(factoextra)
library(corrplot)

library(rstan)

library(kableExtra)
theme_set(theme_abyss(base_size = 12, base_family = 'Open Sans'))

rstan::rstan_options("auto_write" = TRUE)

The most important library is tidyverse, a set of packages which enabled data manipulation in a functional programming style4. Many of our other library needs builds on tidyverse in one way or another. We also depend on libraries for Generalized Mixed Effect Linear Models, and Bayesian statistics. And we need the Ckmeans library because Subcreation uses it.

4 Tidyverse changes R into a completely different style of programming, which is far more declarative. I tend to prefer this style of programming because it removes many errors and bugs which could otherwise occur. It’s also a more succinct style in many cases.

We also developed a set of small helper functions for the World of Warcraft data we need. In particular, we have a class color scheme which matches the scheme used in-game. And we have ways to determine if a player is either melee or ranged, which comes in handy when filtering in data.

4.2.1 Stan

The probalistic programming language Stan needs particular mention5. Stan lets us define a statistical model as a program. This program gets compiled and then run on the data we have observed. It derives a probability density function for our (Bayesian) posterior distribution. Because these posterior distributions rarely have nice analytic solutions6, Stan will sample in the distribution in order to approximate it. This allows us to carry out usual Bayesian analysis on the data.

5 Stan is a typed programming language which gives it a lot of advantages. Many bugs were squashed because of this.

6 You need to solve some nasty integrals, and the more complex your program, the harder the integral gets. Hence we sample the distribution by means of Markov-Chain Monte-Carlo strategies.

4.3 Description of the data sets used

4.3.1 Raider.IO leaderboards

For overall specialization data, we poll the leaderboards for each specialization from raider.io. There are two major datasets. One is sampled over the full leaderboard, randomly. That captures the casual playerbase. In addition, we are sampling from a mythic+ score of 2700 and up7. That data comprises of the top players, and avoids the plateau scores of 2000 and 2500 which needs special treatment due to them being thresholds for Keystone Master and Keystone Hero.

7 This usually corresponds to the 95th percentile of players once we have a full season.

To find the cutoff point of 2700 score, we use binary search to quickly locate the subset of the player base with a greater score. This varies among specializations because some specializations have three to four times as many players. The idea is that the cutoff point eliminates much of the popularity-contest which would otherwise permeate the data.

4.3.2 Raider.IO dungeon data

We also pull the top dungeon runs. There are two sets of data. One follows Subcreation as exactly as possible. This dataset pulls by a stratification over region/dungeon/top-runs. That is, for each of the 4 regions, you pull from each of the 8 dungeons. From each of those 32 pulls, you take the top-k best scored dungeon runs.

The other data-set pulls world-wide. It pulls the top-k dungeon runs from each dungeon, but if a region has overall better runs, it’ll pick more runs from that region. This set of data eliminates variations among regions.

4.3.2.1 How groups are identified

In order to compute a unique group identity we take the roster who run the dungeon. Then we sort the IDs of players, so the order is canonical. Now, each ID is written as Big Endian 64bit number into 8 bytes. These are concatenated, yielding a byte array of size 40. The SHA-256 checksum is taken over this array, which is the group id.

We also do this for the 3 DPS in a group.

The idea is to make a unique identity of a group or the DPS subgroup. Together with the identity of healers and tanks, this allows models for which we model the individual contribution by a healer, a tank, or the 3 DPS as a whole.