# Testing the metric

As nice as it was to define a metric for stall that made physical sense (at least to me), what would be even NICER would be to see that this metric actually *predicts* something.

So what should my stall score predict? How about the length of a battle?

Giving it some thought, it occurred to me that straight-up length-of-battle wasn’t going to cut it for simulator statistics, since plenty of people forfeit halfway through (or even sooner). So I’ve got two options: either I throw out all forfeits (something I can actually do, since PS logs keep track of the battle’s “endType”), or I can normalize by dividing by the number of KOs in a battle.

Okay, so I have a way to test my metric: if “stalliness” means anything in the real world, the stalliness of a team should correlate to “turns/ko.” But here’s a problem: there are two teams in every battle. How do I combine the two stall scores? The answer is boring: you add them up. though the reasons behind this answer are a bit more interesting (basic one: you don’t want to multiply, since stall scores can be negative).

But anyway, so we add the stalliness scores. And plot turns/ko against that measure. What do we get?

That’s a pretty nice correlation! Now why the log-linear scale? Recall that the “stalliness” of an individual pokemon was defined to roughly correlate to the log (base-2) of how many turns it takes a Pokemon to KO itself. So it makes sense you’d want to log turns/ko as well (we could also go the other way and look at 2^stalliness, but it’s not as pretty of a graph).

Oh, I should mention that my sample data set is PS logs for OU battles collected between Aug. 1 and Aug. 30 (I didn’t feel like waiting for the end of the month).

This result made me pretty damn ecstatic. I believe my exact words were, “Damn, it feels good to be a physicist.”

So this made me happy. For kicks, I decided to *also* look at how bias (again, computed as the sum of the biases of the two teams) correlated to turns/ko. Looking at that plot made me less happy.

*bit*stronger, but not significantly. And consider how much more work is involved with computing stalliness vs. computing bias! So, in the end, Innocent Criminal pretty much had the right idea. I’m still going to claim stalliness as an improvement, but it really seems like bias is good enough for everyday use.

*bit*of a positive correlation, but nowhere near as strong as for OU. It’s looking a bit amorphous.

## Trackbacks & Pingbacks