Skip to content

Testing the metric

September 1, 2012

As nice as it was to define a metric for stall that made physical sense (at least to me), what would be even NICER would be to see that this metric actually *predicts* something.

So what should my stall score predict? How about the length of a battle?

Giving it some thought, it occurred to me that straight-up length-of-battle wasn’t going to cut it for simulator statistics, since plenty of people forfeit halfway through (or even sooner). So I’ve got two options: either I throw out all forfeits (something I can actually do, since PS logs keep track of the battle’s “endType”), or I can normalize by dividing by the number of KOs in a battle.

Okay, so I have a way to test my metric: if “stalliness” means anything in the real world, the stalliness of a team should correlate to “turns/ko.” But here’s a problem: there are two teams in every battle. How do I combine the two stall scores? The answer is boring: you add them up. though the reasons behind this answer are a bit more interesting (basic one: you don’t want to multiply, since stall scores can be negative).

But anyway, so we add the stalliness scores. And plot turns/ko against that measure. What do we get?


That’s a pretty nice correlation! Now why the log-linear scale? Recall that the “stalliness” of an individual pokemon was defined to roughly correlate to the log (base-2) of how many turns it takes a Pokemon to KO itself. So it makes sense you’d want to log turns/ko as well (we could also go the other way and look at 2^stalliness, but it’s not as pretty of a graph).

Oh, I should mention that my sample data set is PS logs for OU battles collected between Aug. 1 and Aug. 30 (I didn’t feel like waiting for the end of the month).

This result made me pretty damn ecstatic. I believe my exact words were, “Damn, it feels good to be a physicist.”

So this made me happy. For kicks, I decided to also look at how bias (again, computed as the sum of the biases of the two teams) correlated to turns/ko. Looking at that plot made me less happy.


There’s really not a huge amount of difference (note that it’s negative bias, since he defined positive to mean offensive). I did a linear regression on both plots and found that the stalliness correlation is a bit stronger, but not significantly. And consider how much more work is involved with computing stalliness vs. computing bias! So, in the end, Innocent Criminal pretty much had the right idea. I’m still going to claim stalliness as an improvement, but it really seems like bias is good enough for everyday use.
Oh, wanna see a really strong correlation? Take a look at stalliness plotted vs. bias:
One area where I was really hoping stalliness would come out ahead would be Little Cup, where EV spreads tend to be a bit more “even” (252 EVs is rarely needed to max out any stat, so even purely offensive sets will often see more than a few EVs thrown to the defenses). Also, bias doesn’t take into account Eviolite, while stalliness does.
So let’s start by looking at bias:
There’s a bit of a positive correlation, but nowhere near as strong as for OU. It’s looking a bit amorphous.
Now let’s see stalliness:
Yeah, I think it’s pretty clear that stalliness does better. So yay! I’m declaring victory. Stalliness is a better metric because it applies to Little Cup! Now enough with this–I have a moveset analyzer to finish writing.

From → Uncategorized

One Comment

Trackbacks & Pingbacks

  1. Revisions, Revisions « pokemetrics

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: