Visualizing Chess Data With ggplot

A collection of chess visualization using only… yeah, you know! only ggplot2.

Joshua Kunst http://jkunst.com/
10-30-2015

Table of Contents


There are nice visualizations from chess data: piece movement, piece survaviliy, square usage by player, etc. Sadly not always the authors shows the code/data for replicate the final result. So I wrote some code to show how to do some this great visualizations entirely in R. Just for fun.

The Data

The original data come from here which was parsed and stored in the rchess package.


library(tidyverse)
library(ggplot2)
library(rchess)

theme_set(
  theme_void(base_family = "Segoe UI") + 
    theme(legend.position = "none")
  )

data(chesswc)

chesswc

# A tibble: 1,266 x 11
   event site  date       round white black result whiteelo blackelo
   <chr> <chr> <date>     <dbl> <chr> <chr> <chr>     <int>    <int>
 1 FIDE~ Khan~ 2011-08-28   1.1 Kaab~ Karj~ 0-1        2344     2788
 2 FIDE~ Khan~ 2011-08-28   1.1 Ivan~ Stee~ 1-0        2768     2362
 3 FIDE~ Khan~ 2011-08-28   1.1 Ibra~ Mame~ 0-1        2402     2765
 4 FIDE~ Khan~ 2011-08-28   1.1 Pono~ Gwaz~ 1-0        2764     2434
 5 FIDE~ Khan~ 2011-08-28   1.1 Hans~ Gash~ 0-1        2449     2760
 6 FIDE~ Khan~ 2011-08-28   1.1 Gris~ Genb~ 1-0        2746     2452
 7 FIDE~ Khan~ 2011-08-28   1.1 De L~ Radj~ 0-1        2477     2744
 8 FIDE~ Khan~ 2011-08-28   1.1 Kams~ Di B~ 1-0        2741     2480
 9 FIDE~ Khan~ 2011-08-28   1.1 Lima~ Svid~ 1/2-1~     2493     2739
10 FIDE~ Khan~ 2011-08-28   1.1 Jako~ Sale~ 1-0        2736     2493
# ... with 1,256 more rows, and 2 more variables: eco <chr>,
#   pgn <chr>

chesswc %>% 
  count(event)

# A tibble: 3 x 2
  event                   n
  <chr>               <int>
1 FIDE World Cup 2011   398
2 FIDE World Cup 2013   435
3 FIDE World Cup 2015   433

chesswc <- chesswc %>% 
  filter(event == "FIDE World Cup 2015")

The most important variable here is the pgn game. This pgn is a long string which represent the game. However this format is not so visualization friendly. That’s why I implemented the history_detail() method for a Chess object. Let’s check.


set.seed(123)
pgn <- sample(chesswc$pgn, size = 1)
str_sub(pgn, 0, 50)

[1] "1. d4 Nf6 2. Nf3 d5 3. c4 e6 4. e3 Be7 5. Nbd2 O-O"

Compare the previous string with the first 10 rows of the history_detail()


chss <- Chess$new()
chss$load_pgn(pgn)

[1] TRUE

chss$history_detail() %>%
  arrange(number_move)

# A tibble: 39 x 8
   piece from  to    number_move piece_number_mo~ status
   <chr> <chr> <chr>       <int>            <int> <chr> 
 1 d2 P~ d2    d4              1                1 <NA>  
 2 g8 K~ g8    f6              2                1 game ~
 3 g1 K~ g1    f3              3                1 <NA>  
 4 d7 P~ d7    d5              4                1 <NA>  
 5 c2 P~ c2    c4              5                1 captu~
 6 e7 P~ e7    e6              6                1 game ~
 7 e2 P~ e2    e3              7                1 game ~
 8 f8 B~ f8    e7              8                1 <NA>  
 9 b1 K~ b1    d2              9                1 <NA>  
10 Blac~ e8    g8             10                1 game ~
# ... with 29 more rows, and 2 more variables:
#   number_move_capture <int>, captured_by <chr>

The result is a dataframe where each row is a piece’s movement showing explicitly the cells where the travel in a particular number move. Now we apply this function over the 433 games in the FIDE World Cup 2015.


chesswc <- chesswc %>%
  mutate(game_id = row_number())

library(furrr)
plan(multisession)

dfmoves <- chesswc %>% 
  select(game_id, pgn) %>% 
  mutate(
    data = future_map(pgn, function(p) {
      chss <- Chess$new()
      chss$load_pgn(p)
      chss$history_detail()
    })
  ) %>% select(-pgn) %>% 
  unnest()

# library(doParallel)
# workers <- makeCluster(parallel::detectCores())
# registerDoParallel(workers)
# moves <- plyr::llply(chesswc %>% pull(pgn), function(p) {
#   chss <- Chess$new()
#   chss$load_pgn(p)
#   chss$history_detail()
#   },  .parallel = TRUE, .paropts = list(.packages = c("rchess")))

dfmoves

# A tibble: 41,731 x 9
   game_id piece from  to    number_move piece_number_mo~ status
     <int> <chr> <chr> <chr>       <int>            <int> <chr> 
 1       1 a1 R~ a1    c1             57                1 <NA>  
 2       1 a1 R~ c1    h1             65                2 <NA>  
 3       1 a1 R~ h1    h5             73                3 <NA>  
 4       1 a1 R~ h5    h7             89                4 <NA>  
 5       1 a1 R~ h7    e7             95                5 captu~
 6       1 b1 K~ b1    d2             13                1 <NA>  
 7       1 b1 K~ d2    f1             69                2 <NA>  
 8       1 b1 K~ f1    e3             75                3 captu~
 9       1 c1 B~ c1    b2             31                1 <NA>  
10       1 c1 B~ b2    c1             87                2 game ~
# ... with 41,721 more rows, and 2 more variables:
#   number_move_capture <int>, captured_by <chr>

The dfmoves data frame will be the heart from all these plots due have a lot of information and it is easy to consume.

Piece Movements

To try replicate the result it’s necessary the data to represent (and then plot) the board. In the rchess package there are some helper functions like chessboardata().


dfboard <- rchess:::.chessboarddata() %>%
  select(cell, col, row, x, y, cc)

dfboard

# A tibble: 64 x 6
   cell  col     row     x     y cc   
   <chr> <chr> <int> <int> <int> <chr>
 1 a1    a         1     1     1 b    
 2 b1    b         1     2     1 w    
 3 c1    c         1     3     1 b    
 4 d1    d         1     4     1 w    
 5 e1    e         1     5     1 b    
 6 f1    f         1     6     1 w    
 7 g1    g         1     7     1 b    
 8 h1    h         1     8     1 w    
 9 a2    a         2     1     2 w    
10 b2    b         2     2     2 b    
# ... with 54 more rows

Now we add this information to the dfmoves data frame and calculates some field to to know how to draw the curves (see here for more details).


dfpaths <- dfmoves %>%
  left_join(
    dfboard %>% rename(from = cell, x.from = x, y.from = y),
    by = "from"
    ) %>%
  left_join(
    dfboard %>% rename(to = cell, x.to = x, y.to = y) %>% select(-cc, -col, -row),
    by = "to"
    ) %>%
  mutate(
    x_gt_y = abs(x.to - x.from) > abs(y.to - y.from),
    xy_sign = sign((x.to - x.from)*(y.to - y.from)) == 1,
    x_gt_y_equal_xy_sign = x_gt_y == xy_sign)

The data is ready! So we need now some ggplot, geom_tile for the board, the new geom_curve to represent the piece’s path and some jitter to make this more artistic. Let’s plot the f1 Bishop’s movements.


pieces <- "f1 Bishop"

dfpaths_piece <- dfpaths %>% 
  filter(piece == pieces)

ggplot() +
  geom_tile(data = dfboard, aes(x, y, fill = cc)) +
  geom_curve(
    data = dfpaths_piece %>% filter(x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = 0.50,
    angle = -45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  geom_curve(
    data = dfpaths_piece %>% filter(!x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = -0.50,
    angle = 45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  scale_fill_manual(values =  c("gray10", "gray20")) +
  ggtitle("f1 Bishop") +
  coord_equal()

In the same way we can plot every piece.


pieces <- c("White Queen",
            "h1 Rook",
            "b8 Knight",
            "g2 Pawn",
            "c1 Bishop",
            "f7 Pawn")

dfpaths_pieces <- dfpaths %>% 
  filter(piece %in% pieces)

ggplot() +
  geom_tile(data = dfboard, aes(x, y, fill = cc)) +
  geom_curve(
    data = dfpaths_pieces %>% filter(x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = 0.50,
    angle = -45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  geom_curve(
    data = dfpaths_pieces %>% filter(!x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = -0.50,
    angle = 45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  scale_fill_manual(values =  c("gray10", "gray20")) +
  coord_equal() +
  facet_wrap(vars(piece), ncol = 3)