From the feedback I got after posting my previous results, I started to wonder if stalliness wasn’t working better simply because of an outlier problem. Even full stall teams usually have one offensive member, and offensive teams will often have some utility Pokemon. Do these “outliers” throw off the combined stalliness? Easy enough to check.

Instead of just averaging the stalliness of the entire team, I now calculate three averages: the average stalliness of the entire team, the average stalliness of the everyone sans the Pokemon with the most positive stalliness value, and the average stalliness of everyone sans the Pokemon with the most negative stalliness. I then see whether the “top 5” or “bottom 5” average is farthest to the “full team” average (in other words, I see whether it makes more of a difference to throw out the most stally or the most offensive Pokemon), and I use that as my new stalliness metric, which I call $stall_5$.

Below I have the graph from yesterday showing the distribution of stall scores grouped by team-type, but now I also graph $stall_5$ in red.

As you can see, throwing out the outlier doesn’t exactly make things better–with the exception of the offense group, the distributions become more spread out, not less.

Ah well, so much for an easy fix.