Extracting Pokemon data using R scripts

2018-01-14

The data are supplied in a data frame, pkmn, with one row per currently-released Pokémon. This is a decoded version of the GAME_MASTER file included in the game package, with a few columns added on the end.

library(DT)
library(pkmngor)
datatable(pkmn, extensions='FixedColumns', options=list(fixedColumns=list(leftColumns=2), scrollX = TRUE))

We can use tools from the dplyr package to explore and extract subsets of the data in a “tidy” way.

library(dplyr)

For example, explore the chances that each Pokémon will attack at a certain point during the catch encounter. Slakoth is among the lowest, as might be expected from the idle creature.

pkmn %>% select(pokemonId, encounter.attackProbability) %>% arrange(encounter.attackProbability) %>% head
## # A tibble: 6 x 2
##   pokemonId encounter.attackProbability
##   <chr>                           <dbl>
## 1 SLAKOTH                        0.0100
## 2 SPINDA                         0.0100
## 3 ABRA                           0.0500
## 4 SLOWPOKE                       0.0500
## 5 SLOWBRO                        0.0500
## 6 SUDOWOODO                      0.0500

But what about the highest? I knew Tyranitar was pretty aggressive. But I was surprised to see it was Vigoroth, the first evolution of Slakoth. I look forward to encountering one of these in the wild!

pkmn %>% select(pokemonId, encounter.attackProbability) %>% arrange(desc(encounter.attackProbability)) %>% head
## # A tibble: 6 x 2
##   pokemonId encounter.attackProbability
##   <chr>                           <dbl>
## 1 VIGOROTH                        0.700
## 2 SHARPEDO                        0.500
## 3 BEEDRILL                        0.400
## 4 GYARADOS                        0.400
## 5 EXPLOUD                         0.400
## 6 PRIMEAPE                        0.300

There are ten variables in this data file that appear to govern how Pokémon behave during the encounter. The Silph Road have studied some of these in detail.

#[15] "encounter.movementType"              
#[16] "encounter.movementTimerS"            
#[17] "encounter.jumpTimeS"                 
#[18] "encounter.attackTimerS"              
#[19] "encounter.attackProbability"         
#[20] "encounter.dodgeProbability"          
#[21] "encounter.dodgeDurationS"            
#[22] "encounter.dodgeDistance"             
#[24] "encounter.minPokemonActionFrequencyS"
#[25] "encounter.maxPokemonActionFrequencyS"

What about the lowest base capture rates?

pkmn %>% select(pokemonId, encounter.baseCaptureRate) %>% arrange(encounter.baseCaptureRate) %>% head(20)
## # A tibble: 20 x 2
##    pokemonId encounter.baseCaptureRate
##    <chr>                         <dbl>
##  1 RAIKOU                       0.0200
##  2 ENTEI                        0.0200
##  3 SUICUNE                      0.0200
##  4 LUGIA                        0.0200
##  5 HO_OH                        0.0200
##  6 REGIROCK                     0.0200
##  7 REGICE                       0.0200
##  8 REGISTEEL                    0.0200
##  9 LATIAS                       0.0200
## 10 LATIOS                       0.0200
## 11 KYOGRE                       0.0200
## 12 GROUDON                      0.0200
## 13 RAYQUAZA                     0.0200
## 14 JIRACHI                      0.0200
## 15 DEOXYS                       0.0200
## 16 ARTICUNO                     0.0300
## 17 ZAPDOS                       0.0300
## 18 MOLTRES                      0.0300
## 19 VENUSAUR                     0.0500
## 20 CHARIZARD                    0.0500

Or the highest base capture rates?

pkmn %>% select(pokemonId, encounter.baseCaptureRate) %>% 
  filter(!is.na(encounter.baseCaptureRate)) %>% arrange(desc(encounter.baseCaptureRate)) %>% head(20)
## # A tibble: 20 x 2
##    pokemonId      encounter.baseCaptureRate
##    <chr>                              <dbl>
##  1 RELICANTH                          0.900
##  2 MAGIKARP                           0.700
##  3 FEEBAS                             0.700
##  4 ODDISH                             0.600
##  5 CATERPIE                           0.500
##  6 WEEDLE                             0.500
##  7 PIDGEY                             0.500
##  8 RATTATA                            0.500
##  9 SPEAROW                            0.500
## 10 EKANS                              0.500
## 11 SANDSHREW                          0.500
## 12 NIDORAN_FEMALE                     0.500
## 13 NIDORAN_MALE                       0.500
## 14 JIGGLYPUFF                         0.500
## 15 ZUBAT                              0.500
## 16 VENONAT                            0.500
## 17 DIGLETT                            0.500
## 18 MEOWTH                             0.500
## 19 PSYDUCK                            0.500
## 20 MANKEY                             0.500

Find the base capture rate for a particular Pokémon. Surprised Slakoth isn’t higher - I’ll be surprised the first time I see it jumping out of a Pokéball.

pkmn %>% filter(pokemonId == "SLAKOTH") %>% select(encounter.baseCaptureRate) # 0.4
## # A tibble: 1 x 1
##   encounter.baseCaptureRate
##                       <dbl>
## 1                     0.400

Fury Cutter is a great fast move. Which Pokémon can learn it?

pkmn %>% filter(quickMoves.V1 == "FURY_CUTTER_FAST" | quickMoves.V2 == "FURY_CUTTER_FAST") %>% select(pokemonId)
## # A tibble: 9 x 1
##   pokemonId
##   <chr>    
## 1 PARASECT 
## 2 FARFETCHD
## 3 SCYTHER  
## 4 GLIGAR   
## 5 SCIZOR   
## 6 SCEPTILE 
## 7 NINJASK  
## 8 ZANGOOSE 
## 9 ARMALDO