Last time I gave a high-level overview of how this new usage stats system will be structured. One key difference between Onix and the current usage stats system is that Onix will be processing the whole battle log, not just the parts that are going to immediately lead to reports. Onix will represent battles in a way that the entire battle can be re-created, turn-by-turn, hit-by-hit. The hope is that this will lead to a lot of flexibility in generating novel reports and analyses and give researchers the opportunity to play with well-structured, sanitized, anonymized, and complete data.
This is a non-trivial challenge—it means that Onix will have to contain, if not a full battle engine, then at least one that’s de facto capable of constructing Pokemon Showdown replays. Of course, the alternative would be to store the PS battle logs as they are and work with its existing protocol. And that’s tempting. But at the end of the day, the aim of PS logs is to re-create a battle visually, while Onix’s goal is to facilitate deep Pokemon analyses, which is a markedly different use-case.
So the goal of Onix is to be able to re-create the exact state of a battle at any given time, and the core of that is how we represent a battle’s “state.” Think of everything that goes on in a Pokemon battle: it’s far more than just the HP of the active Pokemon, it’s the weather, it’s the entry hazards, it’s each Pokemon’s boosts and the status of all of the benched Pokemon. Below is a diagram of a logical structural representation of a battle state.
This particular structure represents a two-player singles match where each player is allowed three Pokemon, but it’s easy to see how one might add more Pokemon, more active slots, or even more players. In this structure Pokemon are represented positionally, so if a Pokemon were to switch out, we’d move all its information to the new slot. Note also that there’s a separation between non-volatile statuses (current HP, item, conditions like paralysis…), which are “permanent” changes, and volatile statuses (boosts, ability changes, conditions like taunt), which are lost on switch-out.
As we read through a battle log, this battle state will change. The simplest example is when a turn ends, the “Turn #” will be updated, but pretty much everything that happens in a battle, from switches to moves to ability activation to weather ending, will update the battle state in some way. In the language of Pokemon Showdown, these are all “actions”, but going through the list of major and minor actions, it’s clear that not all actions change a battle state (e.g. “-hint”).
Onix’s battle representations will be limited strictly to changes in battle state, and instead of calling these “actions,” we’ll call them “effects.”
Consider the following battle:
Player A’s team consists of a Qwilfish and an Audino.
Player B’s team consists of a Sandslash and an Emboar.
Start: Player A sends out Qwilfish, Player B sends out Emboar. Emboar’s attack is lowered thanks to Intimidate.
Turn 1: Emboar switches into Sandslash as Qwilfish lays down a layer of Spikes.
Turn 2: Sandslash KOs Qwilfish with an Earthquake. Player A sends out Audino to replace it.
Turn 3: Sandslash switches out into Emboar (Emboar takes Spikes damage) while Audino uses Wish .
Turn 4: Emboar KOs Audino with Close Combat. Player 2 wins the match!
Now let’s structure this log in terms of effects:
Initial state: - Turn #: 0 - Side 0, Slot 0 (active): Qwilfish - Side 0, Slot 1: Audino - Side 1, Slot 0 (active): Emboar - Side 1, Slot 1: Sandslash Effects: - Emboar Atk stage drops to -1 - Turn # advances to 1 - Emboar moves to Side 1, Slot 1 (loses volatile conditions) - Sandslash moves to Side 1, Slot 0 - Spikes added to Field Conditions for Side 1 - Turn # advances to 2 - Qwilfish's HP drops to 0 - Qwilfish gains condition: Faint - Qwilfish moves to Side 0, Slot 1 - Audino moves to Side 0, Slot 0 - Turn # advances to 3 - Sandslash moves to Side 1, Slot 1 - Emboar moves to Side 1, Slot 0 - Emboar's HP drops by 1/8 of Max HP - Turn # advances to 4 - Audino's HP drops to 0 - Audino gains condition: Faint End of Match
There’s some ambiguity here (should we be tracking PP? should “Pending Wish” be a field effect?), but you get the general idea—by advancing through this list of effects you can completely recreate the state of the battle at any given point in the match. But iterating through the above log doesn’t feel complete. Sandslash appears on the field, and Qwilfish mysteriously faints. One can infer what has happened (Sandslash KOed Qwilfish with a powerful move), but not saying it explicitly leaves out information important not just to our understanding of the progression of a battle, but potentially for performing analyses. Say I want to know the total damage done by each move over the course of a battle. The above has no record of actual move usage, so that would be impossible!
The solution is to give our effects “causes.” Explicitly, the model that Onix uses is to group effects into “events,” which are pairings of cause with effect. Rather than further describe what I mean, let me just show you with the above example:
Initial state: - Turn #: 0 - Side 0, Slot 0 (active): Qwilfish - Side 0, Slot 1: Audino - Side 1, Slot 0 (active): Emboar - Side 1, Slot 1: Sandslash Events: - Qwilfish's ability, Intimidate, activates - Emboar Atk stage drops to -1 - Turn ends - Turn # advances to 1 - Player 1 switches out - Emboar moves to Side 1, Slot 1 (loses volatile conditions) - Sandslash moves to Side 1, Slot 0 - Qwilfish uses Spikes - Spikes added to Field Conditions for Side 1 - Turn ends - Turn # advances to 2 - Sandslash uses Earthquake - Qwilfish's HP drops to 0 - Qwilfish gains condition: Faint - Player 0 sends out replacement - Qwilfish moves to Side 0, Slot 1 - Audino moves to Side 0, Slot 0 - Turn ends - Turn # advances to 3 - Player 1 switches out - Sandslash moves to Side 1, Slot 1 - Emboar moves to Side 1, Slot 0 - Entry Hazards, Spikes, activate - Emboar's HP drops to 322 - Audino uses Wish - Turn ends - Turn # advances to 4 - Emboar uses Close Combat - Audino's HP drops to 0 - Audino gains condition: Faint End of Match
The basic idea is that every event has a “cause” (even if it’s just “turn ends”) and 0, 1 or many “effects.” Note that there’s some ambiguity here that I haven’t quite worked out: is the cause for Emboar’s spike damage that Player 2 switched it in? Or that there were Spikes on the field? Here, there’s no real harm “giving it” to the entry hazards, but what we had a scenario where a Pokemon used Whirlwind to drag another one onto the field, and that Pokemon died from hazards damage? Who would “get the kill,” the hazard-layer or the Whirlwind-user? I haven’t fully worked all this out yet, and it might be I change my model to allow an event to have multiple causes, but we’ll see. In the meantime, the important takeaway is that cause + effect(s) = event, and now we have a much clearer picture of what happened in a match.
In the end, Onix “events” are not quite so different from PS “actions” (though Showdown’s decisions regarding cause and effect may differ from mine), but there’s another key difference to how Onix’s battle representation that’s a bit more significant.
Referring to the diagrammed structure of a battle’s state that I laid out earlier, you might have noticed a grouping labelled “static data.” That’s metadata about the battle that won’t change as we iterate through the battle. In a lot of ways, it makes sense to not even include that information in the battle state, since it has little relevance (besides labelling) to the actual evolution of the battle. But in addition to player ratings and format information, there’s some other static data that’s included in the battle state’s structure: Pokemon movesets.
By “movesets” I mean anything about a Pokemon that’s set in the teambuilder: so, species, moves, item, ability, stats spread, etc. While it’s true that a lot of this information can change during a battle—items can be knocked off, moves can be disabled, even species can change (Ditto)—movesets as I’ve defined them represent the sum of all decisions made before the start of the match. Success at Pokémon battling can be considered to be the marriage of teambuilding skill with battling skill, and by separating out movesets, I’ve isolated the teambuilding aspects of the match. So in that context, consider the above diagram, where movesets, non-volatile statuses and volatile statuses have been “refactored” apart. This gives us some interesting flexibility: statuses are now no longer tied to movesets, giving them a more “universal” flavor (so, for example, we’ll consider % HP rather than HP value) and battle events no longer reference specific Pokemon. Also note that whereas before, Pokémon were represented positionally, moving from slot to slot, in this model a Pokemon’s “slot” is fixed, and what changes is a reference to which slot is active. We lose some information doing this (namely the ordering of the “benched” Pokemon), but this information isn’t needed for constructing “replays” (that is, while it might matter for Circle Throw or Illusion, we can represent these changes in other ways). Let’s rewrite that battle log one more time:
Static state: - Side 0, Slot 0: Audino - Side 0, Slot 1: Qwilfish - Side 1, Slot 1: Emboar - Side 1, Slot 1: Sandslash Initial state: - Turn #: 0 - Side 0 Active: Slot 1 - Side 1 Active: Slot 1 Events: - Side 0 Active's ability, activates - Side 1 Active's Atk stage drops to -1 - Turn ends - Turn # advances to 1 - Player 2 switches out - Side 1 Active set to: Slot 1 - Side 1 Active volatile status reset - Side 0 Active uses Spikes - Spikes added to Field Conditions for Side 1 - Turn ends - Turn # advances to 2 - Side 1 Active uses Earthquake - Side 0 Active's %HP set to: 0 - Side 0, Slot 1's non-volatile status set to: Faint - Player 0 sends out replacement - Side 0 Active set to: Slot 0 - Turn ends - Turn # advances to 3 - Player 1 switches out - Side 1 Active set to: Slot 0 - Entry Hazards, Spikes, activate - Side 1, Slot 1's %HP set to: 7/8 - Side 0 Active uses Wish - Turn ends - Turn # advances to 4 - Side 1 Active uses Close Combat - Side 0, Slot 0's %HP set to: 0 - Side 0, Slot 0's non-volatile status set to: Faint End of Match
Now I’ll grant you: this isn’t as human-readable, but humans aren’t the ones processing these logs, are they? A computer has no problem looking up the reference each time, and addressing by reference removes ambiguity for metagames without species clause in addition to allowing greater flexibility for data analysis: say today you want to group events by species. Maybe tomorrow you want to group by full moveset. And maybe the next you want to group by the Pokemon’s type. In all of these cases, rather than having to evaluate each and every event to determine whether it matches one’s criteria, now all one has to do is evaluate the static moveset, and then look up events by slot number.
I know this topic was pretty dense, so if you made it all the way through, I salute you. For those that just skipped here, here’s a tl;dr:
- A Battle State encapsulates all the battle-relevant information about the conditions of a battle.
- Changes to Battle States are called Effects. The progression of a battle can be charted as a list of effects.
- Effects can be grouped by Cause into Events.
- Onix’s Battle States are structured so as to isolate teambuilding intent and so that Effects are as universal as possible.
Next time I’ll be talking about databases! Yay!
So in my last post I announced plans to rewrite my Smogon usage stats scripts as “Project Onix,” a robust, performant and extensible platform for performing Pokemon analyses. Today I’d like to go into a little more detail about what “Project Onix” is actually going to look like.
At its core, Onix’s goal is the same as for the original Smogon Usage Stats project: take logs from Pokemon Showdown (or Pokemon Online or NetBattle or whatever simulator we’re using on a given day) and process them into monthly reports. The old Smogon Usage Stats project did this in two stages:
- Read in logs, pull out relevant information, calculate derived quantities (stalliness and team-tags), structure it and dump it into a collection of processed files
- Read in those processed files and count stuff up to produce the monthly usage stats (including moveset statistics, metagame analyses and checks/counters analyses).
This wasn’t a bad design, but working with files meant that there was a tradeoff between performance and flexibility—anything that got pulled from the logs, or any quantities that were derived would slow down the stat-counting. Consequently, I ended up only pulling information that was going to find its way into a report. That meant, for example, no record of actual move usage or when a Pokemon mega-evolved. And doing pre-processing the way I did meant that if I wanted to change the way something was calculated (say, change the threshold for what constitutes a “baton pass” team), the only way to generate an updated report would be to go back to the logs and start from scratch.
Moving forward, my plan is to segment the workflow more cleanly and, in doing so, add significant flexibility while (hopefully) improving performance. Onix will consist of three subsystems:
The role of this portion is solely to read in data, at this point from simulator logs, but one could imagine alternative data sources (other sims, battle videos…). The aim is to perform little-to-no analysis. There will be some data cleansing here (combining appearance-only formes and equivalent nature/IV/EV spreads), but primarily the goal is just to transform the data into structures that will be easier to process later on.
The collection system will output to a set of “databases,” though I use that term loosely. It could be SQLite tables, it could be MongoDB collections, or I could still be doing file I/O, just in a more segmented way. The goal is to keep the data segregated, so individual analyses can be performed by accessing only the data they need.
The collection system has another focus: completeness. Instead of just pulling the information from the logs that I know I’ll need later on, the goal here is to pull all the battle-relevant data (read: not nicknames, not cosmetic choices, and not chat logs) to structure and process, whether I think I’ll need it or not. Basically, one should, in principle, be able to re-create a Pokemon Showdown battle log (or replay) from the data in the databases (up to nicknames and chat logs).
Why am I doing this? Why take up the CPU cycles and the disk space generating data that I don’t have any plans to use? If I come up with an analysis later on, why not just worry about it then and generate it from the logs? The answer is this: hopefully, this system will be not just for me. Every so often, I get a request from a university researcher or a hobbyist programmer wanting to do some sort of analysis. For the most part, the reports I generate are not sufficient (nor are the intermediate processed files). So right now, if I want to support their projects, I have to give them the PS logs, which are not optimally structured, and which are not anonymized, meaning I have to worry about privacy concerns each and every time I share the data. With this new system, I could give researchers controlled access to the database, letting them only access what they need to while ensuring anonymity by design.
The collection system will do a little bit of data cleansing (mainly in the name of anonymizing and normalizing), the idea being that the steps the collection system performs are steps that will need to be done regardless of use-case. The enrichment subsystem, on the other hand, is geared specifically towards supporting reports. This is where stuff like stalliness and team tags will get computed. It’s also where megas would be combined with base formes, if we went back to counting the old way, and where “matchups” will get parsed from the structured battle logs. Note that the original databases created by the Collection system are left alone: any new information will go in a new table (or collection or file…).
There’s a very real question with enrichment, and that’s: when do you do it? The old way was to do it at the collection step, but you could just as easily do it at report-time. There are definite advantages to performing enrichment as late as possible, namely that it gives you longer to change anything, but the trade-off is that reporting is that waiting until reporting-time means that it takes longer to generate the reports (no one likes it when the stats go up over a week after the month ends). It’s possible that with efficient DB structures (and by leveraging parallel or cluster computing—more on this another day), report-generation might not be very time-consuming, but we’ll have to see, and so it makes sense to keep this subsystem idependent.
The final step is actually generating the reports. Currently how this is done is by going through and reading gigantic processed intermediate files that contain not just the data needed for a specific report, but the data that will go into all the reports (though the intermediate files for the detailed moveset reports are housed separately). This means that all the reports have to be generated together for a given metagame, resulting in a much-larger-than-necessary memory footprint, to avoid iterating through the files multiple times. Under the Onix architecture, each report will only access the resources it needs. Ideally, report-generation will also be fast, thanks to optimizations done at the Collection and Enrichment steps. If we go the database route, then an entire usage ranking report could be generated from a single, fast-running SQL query.
There’s one other addition to the reporting subsystem that I’m really excited about (assuming I can pull it off): rather than simply rely on static reports like I do now, what I’d really like to do is expose a public API (and maybe set up a simple web app) to provide much more specific usage stats than I currently have now. Imagine an interface, for example, where you can ask, “What percentage of Sceptiles that have the ability Contrary carry the move Leaf Storm?” or “What percentage of Heatrans that are on the same team as a Landorus-Therian carry Stealth Rock? Oh, and use a baseline of 1760 instead of 1695,” or even, “What percentage of Latios switch out against a Ferrothorn?” This kind of tool could be incredibly powerful and would encourage exactly the sort of analytical thinking I’d love to see more of in the Pokemon community. Plus, it sounds like a really fun project.
Next time I’ll dig into data types and talk specifically about how Onix will represent battles.
Phew! It’s been a while.
I’m resurrecting this blog to document an exciting new project I’m undertaking: the complete rewrite of my usage stats scripts (gasp!).
When I took on the role of statistician for Smogon, I was a grad student who’d only ever written academic code (where if it runs, no one really cares what it looks like or how it’s written). I used the project to teach myself Python, and I knew not one iota of the principles of good software development. Five years later, I have grown so much as a programmer, a software developer and a data scientist, and frankly, I’m unhappy with the status of the codebase. Yes, the scripts work (for the most part), and keeping everything running has so far not been too onerous, but when I look at my code, I just feel embarrassed. I can do better. And I feel that I owe it to myself to do better.
But it’s really more than that: more and more I’m finding myself hampered by my design decisions: the scripts are slow, any slight hiccup means I have to rerun everything from the beginning, and, most significantly, they’re not easily extensible: people ask me all the time whether I can perform a certain analysis, and my answer is almost invariably “no.” And I really dislike that. I really wish I could do more to foster analytical thinking with regards to Smogon and Pokémon in general, and that’s not really possible with the current set-up.
Burrows at high speed in search of food. The tunnels it leaves are used as homes by Diglett.
Onix has a magnet in its brain. It acts as a compass so that this Pokémon does not lose direction while it is tunneling. As it grows older, its body becomes increasingly rounder and smoother.
I’ve decided to call this rewrite “Project Onix,” after one of my favorite Pokemon, who also happens to be a favorite Pokemon of Roark, miner and gym leader from Diamond-Pearl-Platinum. Over the next few posts, I’ll talk about the design for Onix and more about my plans moving forward. So stay tuned!
I’m keeping it simple, updating based on the new items, moves and abilities and not doing anything groundbreaking.
Several pokemon have had base stats change–my stalliness implementation pulls in those changes for free.
Changes in move power have no bearing on the stalliness metric.
- The abilities Dark Aura, Fairy Aura, Infiltrator,* Parental Bond, Protean, Strong Jaws, Sweet Veil and Tough Claws subtract 0.5 from the metric.
- The abilities Aroma Veil, Bulletproof, Cheek Pouch and Gooey add 0.5 to the metric.
- The ability Fur Coat adds 1.0 to the metric.
- The move Crafty Shield does not affect the metric (as it does not prevent damaging moves).
- The moves King’s Shield, Mat Block and Spiky Shield get added to Protect and Detect in the list of moves that, if present on a moveset, add 1.0 to the metric.
- The move Nuzzle gets added to the other paralysis moves for adding 0.5 to the metric.
- The moves Power-Up Punch and Rototiller gets added to the list of setup moves that subtract 0.5 from the metric (recall that multiple setup moves do not stack).
- The move Geomancy gets added to the list of setup moves that subtract 1.0 from the metric.
- The move Sticky Web subtracts 0.5 from the metric (since stall teams really won’t benefit from having the opponents’ speed lowered).
- The item Assault Vest does not change the metric.
- The items Kee Berry, Maranga Berry, Roseli Berry and Snowball get added to the list of “consumables” which subtract 0.5 from the metric.
- The item Pixie Plate subtracts 0.25 from the metric.
- The item Weakness Policy subtracts 1.0 from the metric.
- The item Safety Goggles does not change the metric (“powder” moves are few and far between, and neutralizing weather is better accomplished with Leftovers)
- Mega Stones, if held by the corresponding Pokemon, will result in stalliness being calculated as the AVERAGE of the metric under each form. That is, for Aerodactyl holding Aerodactylite, calculate stalliness once assuming it stays an Aerodactyl (old stats, old ability), then calculate again assuming Aerodactylite is used and it has the Mega forme’s stats and ability. Take those two values and average them (this is because Mega Evolution is not guaranteed and is in fact limited to one-per-team, even though a team may contain multiple Pokemon that can Mega evolve).
*Infiltrator now bypasses substitute
In the end, I revised the metric a bit further, but before I get into that, I want to point your attention towards my github repository, where I now host my team analyzer (which contains the stalliness algorithm) as a separate file. If you navigate your way over to this folder, you can find an example of how to use the team analyzer script. Feel free to fork my repository, modify my team analyzer, and tell me if you come up with better results. If you ask me nicely, I’ll even provide you with importables of the RMT archive.
After some careful thought and a LOT of testing and re-testing, I made some revisions to my stalliness metric (namely adjusting some key moveset modifications), and the end result is something that I’m pretty happy with.
Before I get into the nitty-gritty of exactly what I changed, I’d like to show off the results:
From the feedback I got after posting my previous results, I started to wonder if stalliness wasn’t working better simply because of an outlier problem. Even full stall teams usually have one offensive member, and offensive teams will often have some utility Pokemon. Do these “outliers” throw off the combined stalliness? Easy enough to check.