[Competitive] Tournament Usage Stats [Update Post #30]

i'd like to point out here that this claim is false. i'd like to think of myself as a "good player" and yet i always build new teams for official tournaments and then recycle them for suspect laddering. your blanket statements don't apply to everyone, so don't base arguments off them.

the way i see it, if kd24 and his crew are willing to go out of their way to compile some interesting tournament data for us, why the hell are we complaining about it?
Again I believe these stats do have uses, but if what you said were true (players using the same teams for both ladder and tourney), then what would these stats offer that the 1337 stats (or whatever its called these days) don't?
 

Lavos

Banned deucer.
because not every 1800+ ranked user on ps makes top 16 of tour? seriously 1800 on showdown is a joke, whereas top 16 in tour is actually significaont. 1337 stats are fine and all but these stats esentially give average players insight as to what the cream of the crop is using, and thus allows those players to create better teams because of this data. that's a highly simplified explanation but you get my point.

if you want proof 1337 stats kinda suck, just compare them to these ones.
 
because not every 1800+ ranked user on ps makes top 16 of tour? seriously 1800 on showdown is a joke, whereas top 16 in tour is actually significaont. 1337 stats are fine and all but these stats esentially give average players insight as to what the cream of the crop is using, and thus allows those players to create better teams because of this data. that's a highly simplified explanation but you get my point.

if you want proof 1337 stats kinda suck, just compare them to these ones.
Then make it top 50, top 30, whatever.IF the same pokemon are used by both, at one point, you will have equal sample sizes with identical (obviously ideal) data. Unless you believe ladder ranking is in no way correlated with skill.
 

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
I thought the proposition was that this data be applied to the metagame as a whole? Maybe I'm misunderstanding you but you make it sound like the ultimate goal is to be successful on the ladder.
That is not at all what I was suggesting.

What I was saying that these stats cannot at all be used to predict what a very strong player will use against you on the ladder or in a tournament. Furthermore: The stats are not strong enough to describe what Pokemon Top Players consistently use.

Unlike Showdown stats which can give you a very strong idea of the frequency of running into certain enemy Pokemon, these stats have basically zero predictive power.


Now, what people are trying to imply (though they are wrong to do so) is that there is some objective "true strength" or "true utility" of each Pokemon, and that these stats--because they come from a higher level of competition--provide a more accurate measure of "true utility." This of course is shot down by too small a sample size.

Of course, that assumption is absurd since there is so little data that no matter how high the level of competition, you couldn't hope to accurately measure said "true utility" with just these numbers.

Of course, this is absurd since there is no unmoving "true utility," but a shifting metagame. If applied to "the entire metagame," one has to consider "population." This is clearly seen in many players changing team styles as they climb up the ladder-- certain team styles seem to work better or worse against different average skill levels-- in other words, different sub-populations within the entire population of the whole metagame.

EXCEPT that you're only sampling from the top players of a tournament-- so you're measuring from only one small sect of the whole meta-- top players (though defining your data as coming from this population is also put into question since in Pokemon the best players don't always win-- with such a small sample size, you can more easily be thrown off by a random bad player managing to make top 16).

Finally, the nail in the coffin is that even if you define the population as Pokemon used by top players, you STILL come back to the problem of not having enough data to make any type of accurate predictions, assumptions, or conclusions. The data is so few and limited that it tells you nothing about what the top players will consistently use.

So basically, what you have is:

(I'm not sure what you mean about the Lando-T bit. It's an excellent Pokemon and the posts don't seem to claim that it is much more than that.)
You have people making the flawed assumption that:

A very small sample with no descriptive power from top battlers happening to have a high Landorus-T use = Landorus-T is a top-utility Pokemon of the whole metagame

^which is clearly a ridiculous assumption, one that is laughable considering the strength of the data's ability to describe (let alone predict)... anything.

Not only are you making flawed logical jumps (like assuming high use by top players in one tournament = high objective "true utility" at all levels of play), but you are also making flawed statistical assumptions (these stats aren't even strong enough to tell you what top players will use with any sort of consistency)

if you want proof 1337 stats kinda suck, just compare them to these ones.
^case in point about people clearly not understanding how statistics work, and jumping to laughable assumptions.
 
Okay, I understand you better you now. I'll admit I just about never use usage statistics to build teams (probably because I prefer Ubers and they are junk there) so I'm not the best one to defend this project.

However, I still want to nitpick the Landorus-T bit. All of those who applauded Lando-T's high usage didn't do so to conclude that it is "a top-utility Pokemon of the whole metagame" but simply because the amount of use reflected their personal convictions that they held prior to seeing the statistics (assuming none of them are liars). Everybody's comments more or less looked like "I'm shocked/surprised/pleased/etc. at Lando-T's usage although this was expected/understandable due to Lando-T's qualities as X/Y/Z". Which is quite different as their opinions aren't being drawn from the statistics but are being reflected in them.
 

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
^the flaw is in acting like these statistics actually validate anything-- there is no reason to be shocked/surprised/pleased/etc., because these stats don't actually say anything about how good Landorus-T is.
 
It says that it was a good pick for the expected metagame. Seeing as tournaments are considered the highest level this shouldn't be completely dismissed. They are just saying it reflects expectations not validates them.

Edit: Again, they didn't say anything like "oh Lando has great usage so I'm right that it's amazing" they just said that the usage reflected (I think there is a nuance between this and validate) what they felt about him. In your example, the results of your poll reflected your personal conviction that blue was the most popular color. Obviously, if you wanted to validate it you would need better statistics etc.
 
It says that it was a good pick for the expected metagame. Seeing as tournaments are considered the highest level this shouldn't be completely dismissed. They are just saying it reflects expectations not validates them.
Imagine a scenario. I believe most people like the colour blue. I survey 10 people with the question "what is your favourite colour?" 6/10 say blue. Have their answers validated my feelings about blue being the most liked colour? No! The sample size is too small to make any statistical conclusions. Obviously simplified but it reflects what is happening here.

Edit: I'll give another simple (and plausible) scenario. I flip a coin 10 times. It is entirely possible I get 8 heads (or tails). Can I say that a coin lands on heads 80% of the time? No, if I kept adding trials (say a 100) I would regress to the mean (of 50%). The problem with tournament stats (though, for the 3rd time, I see uses) is that by the time you get enough stats to have power (statistically), the metagame will have changed and you must exclude earlier data (and your back to low power).
 

Ojama

Banned deucer.
Why are we talking about liking the most used Pokemons by the way? It's not because Jirachi is #1 that everyone likes it, I really don't understand your example here (second one is wrong since you're talking about luck here...). We need to base the usage stats on tournaments because Ladder is just too bad at the moment and is absolutely irrelevant to the usage stats. Is anyone surprised by the tournament stats so far? I don't think so. Maybe we could base the usage stats on the top 32 but I'm not sure if it would change something.

How don't these stats show how good is Landorus-T? Everyone uses it but it doesn't show how good it is? Because Ladder stats show it? It's extremely easy to understand how good are Landorus-T and Jirachi by seeing their usage stats as I said it in my previous post for your information. I would have accepted the same sentence about Ladder stats, Chou Toshio, but with Landorus-I since according to these stats Landorus-I is #30 iirc and so isn't that good which is obviously not the case...

As Melee Mewtwo said it, tournaments are considered the highest level so I really don't see why you guys disagree with these stats. They really represent what people use in the current metagame, what Ladder stats don't.

Darknut07 => The Metagame isn't going to change and even if it does I don't see where's the problem. The Metagame can change but does it change much? Not really, Tyranitar will always be a top usage, same for Landorus-T, Latios, etc.

What I was saying that these stats cannot at all be used to predict what a very strong player will use against you on the ladder or in a tournament. Furthermore: The stats are not strong enough to describe what Pokemon Top Players consistently use.

Unlike Showdown stats which can give you a very strong idea of the frequency of running into certain enemy Pokemon, these stats have basically zero predictive power.
?_? How can these stats not be used to predict what a very strong player will use in tournaments since they're based on... tournaments? Also, I didn't know that it was important to predict what a very strong player will use on the Ladder, I'm sorry. Stop with Showdown please, this is not shoddy, Showdown's Ladder is bad and doesn't represent what a very strong player uses whether it is on the Ladder or in tournaments (how can Ladder stats can be representative of what people use in tournaments?? I really don't understand why you're talking about Ladder when we're talking about tournament stats).

Why do these stats not describe what Pokemon top players consistently use? These stats are based on top players tournament teams but they aren't enough strong to describe what they will consistently use? Hmm. Actually at this point, nothing can describe it I guess.
 

Conflict

is the 9th Smogon Classic Winneris a Three-Time Past SPL Championis the defending GSC Circuit Champion
World Defender
Uh Ojama you didnt get the memo - the sample size is too small to make any educated guesses why something is used and how often.
Sure you can drew conclusions from these stats but these stats are honestly nothing more than a sample which skews the whole picture. -> standard statistics stuff

Its a cool idea for sure but we cant determine facts from these stats because the method used is flawed. First off is the sample size too small and secondly if a player is using the same team for all of his battles (and goes on to win it) this kinda skwes the statistics (because 1 battler has a HUGE impact on the distribution).

Just take these stats with a grain of salt and dont make them out to be the new usage stats. The only thing we can gather from this is whats popular right now in this particular instance.
 

Chou Toshio

Over9000
is an Artist Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Contributor Alumnusis a Top Smogon Media Contributor Alumnusis a Battle Simulator Moderator Alumnus
Darknut-- sorry, I think you get it, but your examples are a bit too removed for people to see them as relevant.

Let me try to illustrate in terms of Pokemon:

Desired Goal of Usage stats: Capture the "true utility" (strength/usefulness) of a Pokemon in this metagame

Assumption in this Thread: "true utility" is best captured by usage of high-level players, since it's assumed that better players are better at assessing this "true utility"-- ie. better players win because they are better at assessing this "true utility."

(Faults of this argument
:)
-All players (even good ones) have their own favorites/habits/preferences, which may prevent them from choosing Pokemon pure based on objected/calculated reasons.
-Top players win through superior skill, but skill is not purely based on Pokemon assessing. Prediction ability, logical skills, bluffing skills, risk assessment, personality profiling ability-- there are any number of skill facets a player can excel at (and succeed from) that are not related to assessing a given Pokemon's abilty

Proposed Goal in this Thread: Accurately capture (consistently and accurately describe, and hopefully predict) Pokemon usage by high-level players. (Because, assumption: Better players better assess a Pokemon's utility)

Problem with Ladder stats: Many argue that the ladder stats are not high-level enough-- you are not collecting data from the target population (high level players).

note: Ladder stats do accurately describe (and can be used to predict) usage by ALL players.

Problem with Tournament stats: Low sample size means you have no accuracy-- you are collecting data from high level players, but the quantity of data is insufficient, and sampling methodology of your data is flawed.

^What does this mean?


Pokemon based example/explanation:

It means that you have the usage stats from high level players from 1 tournament--

But, what if you told these same (good) players to make/share 20 teams each, that they believe to be powerful/effective in BW2 metagame.

You would almost be guaranteed to get wildly different usage stats


Why would the stats become wildly different from those of this tournament? Because the data is so limited that it has no durability or descriptive strength-- you don't have enough data to have any idea what these players would use consistently over the course of more battles. Ie. You don't have an idea what long-term usage (nor "true utility") of a Pokemon should be.


Furthermore, you're HOPING that these 16 players are representative of the population of "excellent Pokemon players." You have the risk of randomly getting 5-6 players who all like Landorus-T, when maybe in the entire population of "excellent Pokemon players," there are far fewer frequent users of Landorus-T proportionately-- your perception of Landorus-T's abilities would become badly skewed.

Without random and sufficient sampling, you can't think of your data as being representative of the whole population (these guys potentially don't accurately represent the views of all good players).

Plus, as mentioned, the "use same team over multiple rounds" compounds the problems associated with faulty sampling methodology (not random sampling).


Is this data interesting and potentially useful? Yes

Does this data tell us what the good players are using consistently? NO

Is this data accurate or reliable when applied to usage of all good players? No

Could we consider using these stats for tiering? No

Can we get an idea of comparative utility or metagame trends from this data? No

Can we compile this data with other tournament data to get a good picture of the metagame? Not advisable-- metagame changes over time, different tournament structures cause different sampling results, doesn't remove problems with lack of random sampling

Can we use this data to inspire our own team building ideas? Yes
 
I think the best indicator of what is going on as far as usage is a separate suspect ladder. A separate ladder automatically weeds out people just coming on the server to play with there favorite mons. If you take the stats starting from a decent rating (lets say 1650) and deviation then that should be an accurate measure of what the serious battlers are using. The tournament stats are interesting though and should mean something down the line but I don't think it would be good using these stats for tiers.
 
I'm sorry to not have been as clear as I should have been. I'm not trying to argue against these stats being useless when it comes to tiering or deciding what is good in the metagame. I was just trying to nitpick the bit concerning Landorus-T as I feel that those who spoke about him weren't saying that at all.

Nobody had ever claimed that results of these stats indicate that Lando-T is a good Pokemon. (at least when you first mentioned it) They all either commented on how it was a strong pick in the metagame of that tournament or just said that it reflected their personal beliefs and went on to explain why they think Lando-T is good. The key bit I think we are disagreeing on is reflected vs validated. If they felt that the stats validated this belief then, yes, you would be right in saying that these statistics misled them as they were basing a conclusion on flawed data. However, having their opinions reflected implies that they already had a conclusion and that the Tourny Usage had the same one. This doesn't mean that they both had it for the same reasons (just about everybody gave good reasons on why they like Lando-T) they just happened to think alike whether its random or not. (yeah, usage stats don't think or draw conclusions but you get my point I hope) If you wanted to convince them that Landorus-T isn't one of the best utility mons in the metagame then you would have do it based upon whatever reasons they used to draw their own conclusions, the usage stats are unrelated. (Again, I'm sorry I didn't try to go more in-depth earlier and left you with two liners.)
 

Ojama

Banned deucer.
I'm wondering if you guys really think that Ladder Stats tell us what good players consistently use and if it's a good idea to use them for tiering because as you maybe know it, I hope, there are no good players on Showdown's Ladder unless it's during Suspect Tests. I obviously understand why you guys disagree with these stats (only based on a few teams/players etc) but we really should stop using Ladder Stats for tiering (unless there is a better Ladder like Shoddy's one). Maybe we could enlarge our working surface (other tournaments, all players of current tour etc) but it seems just too hard to collect all these teams/ask +128 players for their teams.
 

Reymedy

ne craint personne
is a Top Tutor Alumnusis a Tournament Director Alumnusis a Top Team Rater Alumnusis a Forum Moderator Alumnusis a Community Contributor Alumnusis a Tiering Contributor Alumnus
Or, we could do the tournaments on PS!, and collect directly the informations ?
I don't know if it's possible or what, but on another subject, for those people saying that the sample size isn't big enough, it's just totally wrong. From what I see, the sample is big enough, we need to do some work on it. The issue, isn't the sample, it is the population we're studying.
I'm not able to explain it in english, but it's about "averaging" the statistics that we're collecting.

Finally, don't forget that those stats are about top players in theory, that is to say, not made to reflect the whole metagame.
 

Stratos

Banned deucer.
I'm wondering if you guys really think that Ladder Stats tell us what good players consistently use and if it's a good idea to use them for tiering because as you maybe know it, I hope, there are no good players on Showdown's Ladder unless it's during Suspect Tests. I obviously understand why you guys disagree with these stats (only based on a few teams/players etc) but we really should stop using Ladder Stats for tiering (unless there is a better Ladder like Shoddy's one). Maybe we could enlarge our working surface (other tournaments, all players of current tour etc) but it seems just too hard to collect all these teams/ask +128 players for their teams.

just curious as to where tournament players test/develop teams? Besides, i know there are a lot of good players who play almost exclusively or at least often on ladder.

and lest you forget, we've implemented weighted stats.

really unless you're seriously itching to see something like UU metagross i don't see why ladder stats over tourney stats should get your panties in such a bundle, they're nearly identical when it comes to who goes in what tier.
 
I don't know what the benefit of basing the tiers on tournament stats would be. As said in the name of the tier overused, it is based on usage and not quality. Thus the only thing that matters in determining a Pokemon's tier, is wether it reaches 3.4% usage among all players on the standard ladder. Most players know that Metagross and Infernape don't match the quality of other OU Pokemon, but enough people out there thought they do, which makes them overused, because they are literally overused.

Besides that I still think this is cool project and I like to see what a extremly limited pool of top players think, is good in today's metagame.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top