After some careful thought and a LOT of testing and re-testing, I made some revisions to my stalliness metric (namely adjusting some key moveset modifications), and the end result is something that I’m pretty happy with.

Before I get into the nitty-gritty of exactly what I changed, I’d like to show off the results:

The points in black correspond to the stall score as originally defined, with the points in blue corresponding to the new stall scores after the revisions I’ll be outlining in this post. Note that I’ve taken the liberty of including dividing lines for my proposed cutoffs between the various playstyles: I propose that Hyper-Offense correspond to stall scores less than -1 (corresponding to 1.5 turns/KO), Offense (including Bulky Offense) corresponding to stall scores between -1 and 0 (between 1.5 and 3 turns/KO), Balance corresponding to stall scores between 0 and 1 (3-6 turns/KO), Semi-Stall being represented by stall scores between 1 and $\log_2 3$ (~1.58) (6-9 turns/KO), and Full Stall (a.k.a. Stall) corresponding to stalliness readings above $\log_2 3$.

As you can see, it’s not perfect, but a lot of the teams that are incorrectly classified are pathological cases (for example, the Balance team that almost scores high enough to be Full Stall was designed by Molk and is built around a Scraggy). Frankly, no one has adequately explained to me the difference between Offense and Heavy Offense (Hawaiian Air is the Offense team with the lowest stall score, and it features two exploders and two more Pokemon that set up and, frankly, seems much more offensive than the “Hyper Offense” team Reflections).

So yeah, I’m satisfied. Now let’s talk about what I changed:

• Will-o-Wisp only adds 0.5 to the metric rather than 1.0
• Whereas trapping abilities subtract 1.0 from the metric, trapping moves (Block, Mean Look, Spider Web and Pursuit) subtract 0.5 from the metric. As a minor note, if a Pokemon has a trapping ability and a trapping move (presumably because the user is an idiot), it doesn’t get any further modification from the trapping move.
• This is the big one–acknowledging that not all boosting moves are created equal, I have that:
• Belly Drum subtracts 2.0 from the metric. Technically it should be 3.0 (4x attack divided by halved HP = 8x as offensive), but I make some concessions for the fact that, realistically, this would skew the metric far more than is appropriate.
• Shell Smash subtracts 1.5 from the metric (2x offensive divided by 2/3rd-ed defenses = 3x the offensiveness, and $\log_2 3$ is about 1.5). How can I say that Belly Drum shouldn’t get the full -3, but Shell Smash should get the full -1.5? Speed and repeated use. Few Belly Drum Pokemon have the speed / access to priority to sweep, and pulling off a Belly Drum is pretty difficult, since it fails if the user’s health is less than half. But if your Pokemon happens to be in that sweet spot where it’s slower than its opponent before the Shell Smash but faster afterwards, a Shell Smash is as easy to pull off as a Dragon Dance. This justification is rather long and convoluted, I’ll admit, but the bottom like is that it seems to work.
• The following boosting moves subtract 1.0 from the metric: Curse, Dragon Dance, Growth, Shift Gear, Swords Dance, Fiery Dance, Nasty Plot, Tail Glow, Quiver Dance. The theme here is that these moves either boost attack and speed at the same time, boost attack by more than one stage (Growth is more often used in the sun, where it provides a +2 boost), or are powerful attacking moves on their own (Fiery Dance). Curse is here because of Trick Room and Gyro Ball.
• The following boosting moves subtract 0.5 from the metric, simply by virtue of being less effective, or from boosting speed alone: Acupressure, Bulk Up, Coil, Howl, Work Up, Meditate, Sharpen, Calm Mind, Charge Beam, Agility, Autotomize, Flame Charge, Rock Polish, Double Team, Minimize, Tailwind
• These modifications do not stack: Double Dancing sets do not get twice the boost. If a set, for some reason, has multiple boosting moves that have different modifiers, the larger modifier takes precedence (Agility+Swords Dance counts as -1.0)
• The rule that says that Phazing moves, Paralysis moves and Confusion moves add 0.5 to the metric is now four separate rules that stack (Roar and variants, Haze and variants, Paralysis, Confusion). As an example, T-Wave+Roar would add 1.0 combined to the metric. Also, Yawn is grouped in with confusion moves.
• The list of moves which have “negative additional effects” on the user, which subtract 0.5 from the metric, is modified to remove offense-stat dropping moves. You can’t sweep with Overheat.
• The abilities Sand Stream and Snow Warning add 0.5 to the metric, simply due to the passive damage. The reaction to this decision so far has been rather controversial, but I stand by it, as it seemed to really help differentiate between Semi- and Full-Stall. Note that most of the Tyranitar sets I’ve analyzed still come out offensive.

So those are the changes that I made, and as I said, I’m pretty happy with them. One final thing to check was that these modifications didn’t screw up the correlation between turn length and stalliness, which wasn’t that strong to begin with:

Compared to the graph from this post, this new revised score seems to do no worse. I’m not at all surprised or disappointed that it doesn’t do better–player skill plays far more of an important factor in battle length than playstyle, at least in my experience.