I played 15 games with a team generated from my most recent script. My record is 14 - 1, but a) I played far from optimally and b) my opponents probably did too, because creating a new account meant their ratings were all rather low.
My name is "data scientist".
The learning curve for a computer - generated team is going to be steeper. When creating a team, you do so with around an idea of how it is supposed to play. Here, I was clueless (but general unfamiliarity with Gen VI contributed).
I changed how the script functions from the original posting. I am no longer using scores, but a single likelihood value.
What this means is that it is easy to update the probability a team is optimal using Bayes' theorem:
probability a team is optimal given evidence = e^(ln(prior probability the team is optimal) + ln(how likely the evidence is to show up, given the team))
(derived from p(A|B) = p(A) * p(B|A)/p(B), for our purposes here)
This means, what I want help with here is:
-suggestions on what kinds of evidence I can pull from, among the available data
-help defining how likely that evidence is
To avoid underflow, I have been using natural log probabilities, rather than the more typical simple probabilities ranging from 0 to 1.
We initially assign equal probabilities to all teams. Currently, my script then performs the following updates:
1) For each of the 50 most used pokemon, it updates based on the probability the team can check/counter it
without losing a single pokemon. An imaginary ideal team wouldn't lose pokemon, likelihood of losing 0 is natural to use.
2) For each of the 50 most used pokemon, it updates based on the probability the team can check/counter it
without losing more than 1 pokemon. An imaginary ideal team wouldn't lose many pokemon, likelihood of losing <= 1 is natural to use.
3) For each of the 50 most used pokemon, it updates based on the probability the team can check/counter it
without losing more than 2 pokemon. An imaginary ideal team wouldn't lose many pokemon, likelihood of losing <= 2 is natural to use.
4) For each of the 50 most used pokemon, it updates based on the probability that
the most vulnerable pokemon on our team can avoid being checked/countered by it. An imaginary ideal team wouldn't too easily let its pokemon get checked/countered, so likelihood that 0 do is natural to use.
5) For each of the 50 most used pokemon, it updates based on the probability that
the two most vulnerable pokemon on our team can avoid both being checked/countered by it. An imaginary ideal team wouldn't too easily let its pokemon get checked/countered, so likelihood that <= 1 do is natural to use.
6) For each of the 50 most used pokemon, it updates based on the probability that
the three most vulnerable pokemon on our team can avoid both being checked/countered by it. An imaginary ideal team wouldn't too easily let its pokemon get checked/countered, so likelihood that <= 2 do is natural to use.
7) Good teams don't have >1 megas, so the likelihood that either 0 or 1 of the pokemon on the team do.
8) Good teams have stealth rock, so the likelihood that at least 1 pokemon on the team uses it is natural to use.
(Deciding how much weight to give each update is also important. I weigh each opponent's pokemon based on the relative frequency with which they're encountered at higher levels of play, and also give more weight to being able to deal with opposing pokemon sooner than later; I will likely update to stretch through the entire team, declining exponentially in weight for each extra pokemon like
Tsaeb XIII suggested.)
Pokemon don't have to be mega, and lots of pokemon that can learn stealth rock don't necessarily want to. "7)" and "8)" thus naturally punish getting pokemon to run sub-optimal sets; eg a team of charizard and mawile is very unlikely to have 1 or less mega pokemon.
It is extremely easy for me to add extra evidence to update these hypothesis. The only requirement is that it is in the form of probabilities.
I am not sure how to do this with a team-mate grade. An example of something that would work: if we can define a role based on certain parameters, we can then assign probabilities to pokemon for their ability to successfully fit that role.
On this front, I first performed a KMeans cluster analysis on each of the top 50 pokemon simply using:
-Mean of (interactions between that pokemon and pokemon_other / (use of pokemon * use of pokemon_other) )
-Standard deviation of the above
-Mean interaction win
-Standard deviation of the above
Calling for 5 groups, I get:
Group 1 : ['Greninja', 'Aegislash', 'Rotom-Wash', 'Ferrothorn', 'Scizor', 'Tyranitar', 'Breloom', 'Skarmory', 'Bisharp', 'Latios', 'Thundurus', 'Gardevoir', 'Mamoswine', 'Infernape', 'Mandibuzz', 'Volcarona', 'Landorus', 'Goodra']
Group 2 : ['Talonflame', 'Azumarill', 'Excadrill', 'Espeon', 'Landorus-Therian', 'Clefable', 'Alakazam', 'Medicham', 'Cloyster', 'Deoxys-Speed', 'Vaporeon']
Group 3 : ['Scolipede', 'Smeargle', 'Deoxys-Defense']
Group 4 : ['Gliscor', 'Venusaur', 'Heatran', 'Conkeldurr', 'Sylveon', 'Keldeo', 'Manectric', 'Chansey', 'Sableye', 'Blissey']
Group 5 : ['Charizard', 'Garchomp', 'Dragonite', 'Gengar', 'Gyarados', 'Togekiss', 'Mawile', 'Pinsir']
We can vastly expand and alter the types of data we use for this, but...
...a discriminant function analysis would be far more useful.
In a discriminant function analysis (DFA) you give the data, and group-membership to the program.
As in, with a list of 50 pokemon and the groups they belong to, I can feed it that and a fat pile of data tied to the pokemon.
It can then be fed data on a pokemon, and spit out the probabilities that it belongs to each of the groups.
Of course, this leads to the question - how do we apply probabilities of group membership to team performance?
Are good teams likely to have certain patterns? If so, how do we define that?
We could do this for any sort of groups you define. Glue vs nonglue, sweeper vs tank vs wall vs supporter vs revenge killer, etc.
I think it is in this way that we may be able to give Deo-D some credit.
Hmmm.
OU Teambuilding
OU Pokemon Categorization Tread
Antar ,
Thanks.
Here's one way I think we can evaluate how successful your work is: we should take a bunch of real teams from the ladder and evaluate their "answering ability" and "vulnerability" scores and see how these metrics numbers correlate to, say, Elo/Glicko rating.
The function "likelihood" from my script will produce a likelihood rating for any given team.
It must simply be given the following argument
argument = [list_of_team_members, dictionary_of_pokemon_usages, unweighted_ou_pokemon_data, weighed_by_1825_ou_pokemon_data, total_uses_of_top_50_pokemon]
and it will return a likelihood.
Any two likelihoods can easily be compared; the higher the likelihood, the more likely the team is better (or so the script predicts).
e^(likelihood_of_team2 - likelihood_of_team1) = probability that team2 is better than team1.
I would be very interested in knowing how this correlates with actual ELO or Glicko ratings.
EDIT:
Stryke , encounter matrixes would of course be interesting.
Overall however, your method of looking at the pokemon appears quite different from mine. How effective did you find it at producing suggestions/making improvements?
EDIT:
I attached the most recent update to my script. Here are the top few results it generated:
Team: ('Aegislash', 'Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 1.0
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Bisharp', 'Mamoswine') ; Relative Prob: 0.983756510516
Team: ('Aegislash', 'Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Mamoswine') ; Relative Prob: 0.966421036151
Team: ('Aegislash', 'Venusaur', 'Heatran', 'Conkeldurr', 'Latios', 'Mamoswine') ; Relative Prob: 0.956584751979
Team: ('Aegislash', 'Dragonite', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.955332494453
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Bisharp', 'Mamoswine') ; Relative Prob: 0.949232968402
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Sylveon', 'Mamoswine') ; Relative Prob: 0.94527844914
Team: ('Talonflame', 'Excadrill', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.940396927782
Team: ('Aegislash', 'Venusaur', 'Heatran', 'Conkeldurr', 'Thundurus', 'Mamoswine') ; Relative Prob: 0.936177628366
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.935832019792
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Sylveon', 'Mamoswine') ; Relative Prob: 0.935421186685
Team: ('Greninja', 'Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.933568583059
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Thundurus', 'Mamoswine') ; Relative Prob: 0.926194616035
Team: ('Aegislash', 'Talonflame', 'Gliscor', 'Venusaur', 'Heatran', 'Mamoswine') ; Relative Prob: 0.925466288184
Team: ('Talonflame', 'Dragonite', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.921427082901
Team: ('Talonflame', 'Gliscor', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.920732565763
Team: ('Aegislash', 'Venusaur', 'Heatran', 'Breloom', 'Latios', 'Mamoswine') ; Relative Prob: 0.920452416057
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Gyarados', 'Mamoswine') ; Relative Prob: 0.919947410468
Team: ('Aegislash', 'Dragonite', 'Venusaur', 'Heatran', 'Breloom', 'Mamoswine') ; Relative Prob: 0.919696042633
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.916807857623
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Venusaur', 'Heatran', 'Mamoswine') ; Relative Prob: 0.916292115793
Team: ('Talonflame', 'Heatran', 'Breloom', 'Conkeldurr', 'Mawile', 'Mamoswine') ; Relative Prob: 0.915735603015
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Breloom', 'Sylveon', 'Mamoswine') ; Relative Prob: 0.915730367723
Team: ('Venusaur', 'Heatran', 'Conkeldurr', 'Bisharp', 'Latios', 'Mamoswine') ; Relative Prob: 0.913421932261
Team: ('Talonflame', 'Azumarill', 'Venusaur', 'Heatran', 'Conkeldurr', 'Mamoswine') ; Relative Prob: 0.912294518987
Team: ('Talonflame', 'Venusaur', 'Heatran', 'Conkeldurr', 'Latios', 'Mamoswine') ; Relative Prob: 0.91217057109
Team: ('Aegislash', 'Talonflame', 'Azumarill', 'Heatran', 'Breloom', 'Mamoswine') ; Relative Prob: 0.909314541675
Team: ('Aegislash', 'Talonflame', 'Heatran', 'Breloom', 'Gardevoir', 'Mamoswine') ; Relative Prob: 0.909300729789
Note: these pokemon on the teams are simply in order of usage; if you're going to try a team, pick the most effective lead out of the six to fill that role.