Visualizing Chess Data With ggplot – Data, Code and Visualization

There are nice visualizations from chess data: piece movement, piece survaviliy, square usage by player, etc. Sadly not always the authors shows the code/data for replicate the final result. So I wrote some code to show how to do some this great visualizations entirely in R. Just for fun.

The Data

The original data come from here which was parsed and stored in the rchess package.

Code

library(tidyverse)
library(ggplot2)
library(rchess)

theme_set(
  theme_void(base_family = "Segoe UI") + 
    theme(legend.position = "none")
  )

data(chesswc)

chesswc

# A tibble: 1,266 × 11
   event site  date       round white black result whiteelo blackelo eco   pgn  
   <chr> <chr> <date>     <dbl> <chr> <chr> <chr>     <int>    <int> <chr> <chr>
 1 FIDE… Khan… 2011-08-28   1.1 Kaab… Karj… 0-1        2344     2788 D15   1. d…
 2 FIDE… Khan… 2011-08-28   1.1 Ivan… Stee… 1-0        2768     2362 E68   1. c…
 3 FIDE… Khan… 2011-08-28   1.1 Ibra… Mame… 0-1        2402     2765 E67   1. N…
 4 FIDE… Khan… 2011-08-28   1.1 Pono… Gwaz… 1-0        2764     2434 B40   1. e…
 5 FIDE… Khan… 2011-08-28   1.1 Hans… Gash… 0-1        2449     2760 A61   1. d…
 6 FIDE… Khan… 2011-08-28   1.1 Gris… Genb… 1-0        2746     2452 D37   1. d…
 7 FIDE… Khan… 2011-08-28   1.1 De L… Radj… 0-1        2477     2744 B30   1. e…
 8 FIDE… Khan… 2011-08-28   1.1 Kams… Di B… 1-0        2741     2480 B90   1. e…
 9 FIDE… Khan… 2011-08-28   1.1 Lima… Svid… 1/2-1…     2493     2739 D85   1. d…
10 FIDE… Khan… 2011-08-28   1.1 Jako… Sale… 1-0        2736     2493 B69   1. e…
# ℹ 1,256 more rows

Code

chesswc %>% 
  count(event)

# A tibble: 3 × 2
  event                   n
  <chr>               <int>
1 FIDE World Cup 2011   398
2 FIDE World Cup 2013   435
3 FIDE World Cup 2015   433

Code

chesswc <- chesswc %>% 
  filter(event == "FIDE World Cup 2015")

The most important variable here is the pgn game. This pgn is a long string which represent the game. However this format is not so visualization friendly. That’s why I implemented the history_detail() method for a Chess object. Let’s check.

Code

set.seed(123)
pgn <- sample(chesswc$pgn, size = 1)
str_sub(pgn, 0, 50)

[1] "1. d4 Nf6 2. c4 e6 3. Nc3 Bb4 4. Nf3 b6 5. g3 Bb7 "

Compare the previous string with the first 10 rows of the history_detail()

Code

chss <- Chess$new()
chss$load_pgn(pgn)

[1] TRUE

Code

chss$history_detail() %>%
  arrange(number_move)

# A tibble: 178 × 8
   piece    from  to    number_move piece_number_move status number_move_capture
   <chr>    <chr> <chr>       <int>             <int> <chr>                <int>
 1 d2 Pawn  d2    d4              1                 1 <NA>                    NA
 2 g8 Knig… g8    f6              2                 1 <NA>                    NA
 3 c2 Pawn  c2    c4              3                 1 game …                  NA
 4 e7 Pawn  e7    e6              4                 1 <NA>                    NA
 5 b1 Knig… b1    c3              5                 1 captu…                  12
 6 f8 Bish… f8    b4              6                 1 <NA>                    NA
 7 g1 Knig… g1    f3              7                 1 <NA>                    NA
 8 b7 Pawn  b7    b6              8                 1 game …                  NA
 9 g2 Pawn  g2    g3              9                 1 <NA>                    NA
10 c8 Bish… c8    b7             10                 1 <NA>                    NA
# ℹ 168 more rows
# ℹ 1 more variable: captured_by <chr>

The result is a dataframe where each row is a piece’s movement showing explicitly the cells where the travel in a particular number move. Now we apply this function over the 433 games in the FIDE World Cup 2015.

Code

chesswc <- chesswc %>%
  mutate(game_id = row_number())

library(furrr)
plan(multisession, workers = 2)

dfmoves <- chesswc %>% 
  select(game_id, pgn) %>% 
  mutate(
    data = future_map(pgn, function(p) {
      chss <- Chess$new()
      chss$load_pgn(p)
      chss$history_detail()
    })
  ) %>% select(-pgn) %>% 
  unnest()

# library(doParallel)
# workers <- makeCluster(parallel::detectCores())
# registerDoParallel(workers)
# moves <- plyr::llply(chesswc %>% pull(pgn), function(p) {
#   chss <- Chess$new()
#   chss$load_pgn(p)
#   chss$history_detail()
#   },  .parallel = TRUE, .paropts = list(.packages = c("rchess")))

dfmoves

# A tibble: 41,616 × 9
   game_id piece     from  to    number_move piece_number_move status   
     <int> <chr>     <chr> <chr>       <int>             <int> <chr>    
 1       1 a1 Rook   a1    c1             57                 1 <NA>     
 2       1 a1 Rook   c1    h1             65                 2 <NA>     
 3       1 a1 Rook   h1    h5             73                 3 <NA>     
 4       1 a1 Rook   h5    h7             89                 4 <NA>     
 5       1 a1 Rook   h7    e7             95                 5 captured 
 6       1 b1 Knight b1    d2             13                 1 <NA>     
 7       1 b1 Knight d2    f1             69                 2 <NA>     
 8       1 b1 Knight f1    e3             75                 3 captured 
 9       1 c1 Bishop c1    b2             31                 1 <NA>     
10       1 c1 Bishop b2    c1             87                 2 game over
# ℹ 41,606 more rows
# ℹ 2 more variables: number_move_capture <int>, captured_by <chr>

The dfmoves data frame will be the heart from all these plots due have a lot of information and it is easy to consume.

Piece Movements

To try replicate the result it’s necessary the data to represent (and then plot) the board. In the rchess package there are some helper functions like chessboardata().

Code

dfboard <- rchess:::.chessboarddata() %>%
  select(cell, col, row, x, y, cc)

dfboard

# A tibble: 64 × 6
   cell  col     row     x     y cc   
   <chr> <chr> <int> <int> <int> <chr>
 1 a1    a         1     1     1 b    
 2 b1    b         1     2     1 w    
 3 c1    c         1     3     1 b    
 4 d1    d         1     4     1 w    
 5 e1    e         1     5     1 b    
 6 f1    f         1     6     1 w    
 7 g1    g         1     7     1 b    
 8 h1    h         1     8     1 w    
 9 a2    a         2     1     2 w    
10 b2    b         2     2     2 b    
# ℹ 54 more rows

Now we add this information to the dfmoves data frame and calculates some field to to know how to draw the curves (see here for more details).

Code

dfpaths <- dfmoves %>%
  left_join(
    dfboard %>% rename(from = cell, x.from = x, y.from = y),
    by = "from"
    ) %>%
  left_join(
    dfboard %>% rename(to = cell, x.to = x, y.to = y) %>% select(-cc, -col, -row),
    by = "to"
    ) %>%
  mutate(
    x_gt_y = abs(x.to - x.from) > abs(y.to - y.from),
    xy_sign = sign((x.to - x.from)*(y.to - y.from)) == 1,
    x_gt_y_equal_xy_sign = x_gt_y == xy_sign)

The data is ready! So we need now some ggplot, geom_tile for the board, the new geom_curve to represent the piece’s path and some jitter to make this more artistic. Let’s plot the f1 Bishop’s movements.

Code

pieces <- "f1 Bishop"

dfpaths_piece <- dfpaths %>% 
  filter(piece == pieces)

ggplot() +
  geom_tile(data = dfboard, aes(x, y, fill = cc)) +
  geom_curve(
    data = dfpaths_piece %>% filter(x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = 0.50,
    angle = -45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  geom_curve(
    data = dfpaths_piece %>% filter(!x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = -0.50,
    angle = 45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  scale_fill_manual(values =  c("gray10", "gray20")) +
  ggtitle("f1 Bishop") +
  coord_equal()

In the same way we can plot every piece.

Code

pieces <- c("White Queen",
            "h1 Rook",
            "b8 Knight",
            "g2 Pawn",
            "c1 Bishop",
            "f7 Pawn")

dfpaths_pieces <- dfpaths %>% 
  filter(piece %in% pieces)

ggplot() +
  geom_tile(data = dfboard, aes(x, y, fill = cc)) +
  geom_curve(
    data = dfpaths_pieces %>% filter(x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = 0.50,
    angle = -45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  geom_curve(
    data = dfpaths_pieces %>% filter(!x_gt_y_equal_xy_sign),
    aes(
      x = x.from,
      y = y.from,
      xend = x.to,
      yend = y.to
    ),
    position = position_jitter(width = 0.2, height = 0.2),
    curvature = -0.50,
    angle = 45,
    alpha = 0.02,
    color = "white",
    size = 1.05
  ) +
  scale_fill_manual(values =  c("gray10", "gray20")) +
  coord_equal() +
  facet_wrap(vars(piece), ncol = 3)

I think it’s look very nice and similar to the original work made by Steve Tung.

Survival Rates

In this plot we need filter dfmoves by !is.na(status) so we can know what happend with every piece in at the end of the game: if a piece was caputered of or not. Then get summary across all the games.

Code

dfsurvrates <- dfmoves %>%
  filter(!is.na(status)) %>%
  group_by(piece) %>%
  summarize(
    games = n(),
    was_captured = sum(status == "captured")
    ) %>%
  mutate(surv_rate = 1 - was_captured/games)

dfsurvrates %>%
  arrange(desc(surv_rate))

# A tibble: 32 × 4
   piece      games was_captured surv_rate
   <chr>      <int>        <int>     <dbl>
 1 Black King   433            0     1    
 2 White King   433            0     1    
 3 h2 Pawn      433          121     0.721
 4 h7 Pawn      433          148     0.658
 5 g2 Pawn      433          150     0.654
 6 g7 Pawn      433          160     0.630
 7 f2 Pawn      433          178     0.589
 8 a2 Pawn      433          183     0.577
 9 a7 Pawn      433          185     0.573
10 f7 Pawn      433          185     0.573
# ℹ 22 more rows

This helps as validation because the kings are never captured. Now we use a helper function in the rchess package rchess:::.chesspiecedata() to get the start position for every piece and then plot the survival rates in the cell where the piece start in the game.

Code

dfsurvrates <- dfsurvrates %>%
  left_join(rchess:::.chesspiecedata() %>% select(start_position, piece = name, color, unicode),
            by = "piece") %>%
  full_join(dfboard %>% rename(start_position = cell),
            by = "start_position")

# Auxiliar data to plot the board
dfboard2 <- data_frame(x = 0:8 + 0.5, y = 0 + 0.5, xend = 0:8 + 0.5, yend = 8 + 0.5)

ggplot(dfsurvrates) +
  geom_tile(data = dfsurvrates %>% filter(!is.na(surv_rate)),
            aes(x, y, fill = surv_rate)) +
  scale_fill_gradient(low = "darkred",  high = "white") +
  geom_text(data = dfsurvrates %>% filter(!is.na(surv_rate)),
            aes(x, y, label = scales::percent(surv_rate)),
            color = "gray70", size = 3) +
  scale_x_continuous(breaks = 1:8, labels = letters[1:8]) +
  scale_y_continuous(breaks = 1:8, labels = 1:8)  +
  geom_segment(data = dfboard2, aes(x, y, xend = xend, yend = yend), color = "gray70") +
  geom_segment(data = dfboard2, aes(y, x, xend = yend, yend = xend), color = "gray70") +
  ggtitle("Survival Rates for each piece") + 
  coord_equal() + 
  theme(legend.position = "none")

Obviously the plot show same data in text and color, and there a lot of space without information but the idea is use the chess board to represent the initial position in a chess game.

We can replace the texts with the piece’s icons:

Code

ggplot(dfsurvrates) +
  geom_tile(data = dfsurvrates %>% filter(!is.na(surv_rate)),
            aes(x, y, fill = 100*surv_rate)) +
  scale_fill_gradient(NULL, low = "darkred",  high = "white") +
  geom_text(data = dfsurvrates %>% filter(!is.na(surv_rate)),
            aes(x, y, label = unicode), size = 11, color = "gray20", alpha = 0.7) +
  scale_x_continuous(breaks = 1:8, labels = letters[1:8]) +
  scale_y_continuous(breaks = 1:8, labels = 1:8)  +
  geom_segment(data = dfboard2, aes(x, y, xend = xend, yend = yend), color = "gray70") +
  geom_segment(data = dfboard2, aes(y, x, xend = yend, yend = xend), color = "gray70") +
  ggtitle("Survival Rates for each piece") + 
  coord_equal() +
  theme(legend.position = "bottom")

Square Usage By Player

For this visualization we will use the to variable. First of all we select the player who have more games in the table chesswc. Then for each of them get the to counts.

Code

players <- chesswc %>% 
  count(white) %>% 
  arrange(desc(n)) %>%
  pull(white) %>% 
  head(4)

players

[1] "Karjakin, Sergey" "Svidler, Peter"   "Wei, Yi"          "Adams, Michael"

Code

dfmov_players <- map_df(players, function(p){ # p <- sample(players, size = 1)
  games <- chesswc %>% filter(white == p) %>% .$game_id
  dfres <- dfmoves %>%
    filter(game_id %in% games, !is.na(to)) %>%
    count(to) %>%
    mutate(player = p,
           p = n/length(games))
  dfres
})

dfmov_players <- dfmov_players %>%
  rename(cell = to) %>%
  left_join(dfboard, by = "cell")

ggplot(dfmov_players) +
  geom_tile(aes(x, row, fill = p)) +
  scale_fill_gradient("Movements to every cell\n(normalized by number of games)",
                      low = "white",  high = "darkblue") +
  geom_text(aes(x, row, label = round(p, 1)), size = 2, color = "white", alpha = 0.5) +
  facet_wrap(~player) +
  scale_x_continuous(breaks = 1:8, labels = letters[1:8]) +
  scale_y_continuous(breaks = 1:8, labels = 1:8)  +
  geom_segment(data = dfboard2, aes(x, y, xend = xend, yend = yend), color = "gray70") +
  geom_segment(data = dfboard2, aes(y, x, xend = yend, yend = xend), color = "gray70") +
  coord_equal() +
  theme(legend.position = "bottom")

Distributions For The First Movement

Now, with the same data and using the piece_number_move and number_move we can obtain the distribution for the first movement for each piece.

Code

piece_lvls <- rchess:::.chesspiecedata() %>%
  mutate(col = str_extract(start_position, "\\w{1}"),
         row = str_extract(start_position, "\\d{1}")) %>%
  arrange(desc(row), col) %>%
  pull(name)

dfmoves_first_mvm <- dfmoves %>%
  mutate(piece = factor(piece, levels = piece_lvls),
         number_move_2 = ifelse(number_move %% 2 == 0, number_move/2, (number_move + 1)/2 )) %>%
  filter(piece_number_move == 1)

ggplot(dfmoves_first_mvm) +
  geom_density(aes(number_move_2), fill = "#B71C1C", alpha = 0.8, color = NA) +
  scale_y_continuous(breaks = NULL) +
  scale_x_continuous(breaks = c(0, 20, 40), limits = c(0, 40)) + 
  facet_wrap(~piece, nrow = 4, ncol = 8, scales = "free_y")  +
  labs(x = "Number Move", y = "Density") +
  theme_minimal(base_size = 7) +
  theme(strip.text = element_text(hjust = 0))

Notice the similarities between the White King and h1 Rook due the castling, the same effect is present between the Black King and the h8 Rook.

Who Captures Whom

For this plot we’ll use the igraph package and ForceAtlas2 package an R implementation by Adolfo Alvarez of the Force Atlas 2 graph layout designed for Gephi.

We get the rows with status == "captured" and summarize by piece and captured_by variables. The result data frame will be the edges in our igraph object using the graph.data.frame function.

Code

library(igraph)

# devtools::install_github("analyxcompany/ForceAtlas2")
library(ForceAtlas2)

dfcaputures <- dfmoves %>%
  filter(status == "captured") %>%
  count(captured_by, piece) %>%
  ungroup() %>% 
  arrange(desc(n)) %>% 
  filter(!is.na(captured_by))

dfvertices <- rchess:::.chesspiecedata() %>%
  select(-fen, -start_position) %>%
  mutate(name2 = str_replace(name, " \\w+$", unicode),
         name2 = str_replace(name2, "White|Black", ""))

g <- graph.data.frame(dfcaputures %>% select(captured_by, piece, weight = n),
                      directed = TRUE,
                      vertices = dfvertices)

set.seed(123)
# lout <- layout.kamada.kawai(g)
lout <- layout.forceatlas2(g, iterations = 10000, plotstep = 0)

dfvertices <- dfvertices %>%
  mutate(x = lout[, 1], y = lout[, 2])

dfedges <- as_data_frame(g, "edges") %>%
  tibble::as_tibble() %>%
  left_join(dfvertices %>% select(from = name, x, y), by = "from") %>%
  left_join(dfvertices %>% select(to = name, xend = x, yend = y), by = "to")

To plot the the network I prefer use ggplot2 instead igraph just you get more control in the style and colors.

Code

ggplot() +
  geom_curve(data = dfedges %>%
               filter((str_extract(from, "\\d+") %in% c(1, 2) |
                         str_detect(from, "White"))),
             aes(x, y, xend = xend, yend = yend, alpha = weight, size = weight),
             curvature = 0.1, color = "red") +
  geom_curve(data = dfedges %>%
               filter(!(str_extract(from, "\\d+") %in% c(1, 2) |
                          str_detect(from, "White"))),
             aes(x, y, xend = xend, yend = yend, alpha = weight, size = weight),
             curvature = 0.1, color = "blue") +
  scale_alpha(range = c(0.01, 0.5)) +
  scale_size(range = c(0.01, 2)) +
  geom_point(data = dfvertices, aes(x, y, color = color), size = 13, alpha = 0.9) +
  scale_color_manual(values = c("gray10", "gray90")) +
  geom_text(data = dfvertices %>% filter(str_length(name2) != 1),
            aes(x, y, label = name2), size = 5, color = "gray50") +
  geom_text(data = dfvertices %>% filter(str_length(name2) == 1),
            aes(x, y, label = name2), size = 5, color = "gray50") +
  ggtitle("Red: white captures black | Blue: black captures white") +
  theme(legend.position = "none")

It’s know we usually exchange pieces with the same values: queen by queen, knight by bishop, etc. The interesting fact we see here is the d2 pawn/c7 pawn/g1 knight relationship because d2 pawn/c7 pawn is not so symmetrical and it’s explained by the popular use the Sicilian Opening in a master level (1.e4 c5 2.Nf3 d6 3.d4 cxd4 4.Nxd4).

I hope you enjoyed this post in the same way I enjoyed doing it :D. If you notice a mistake please let me know.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{kunst_fuentes2015,
  author = {Kunst Fuentes, Joshua},
  title = {Visualizing {Chess} {Data} {With} Ggplot},
  date = {2015-10-30},
  url = {https://jkunst.com/blog/posts/2015-10-30-visualizing-chess-data-with-ggplot/},
  langid = {en}
}

For attribution, please cite this work as:

Kunst Fuentes, Joshua. 2015. “Visualizing Chess Data With Ggplot.” October 30, 2015. https://jkunst.com/blog/posts/2015-10-30-visualizing-chess-data-with-ggplot/.