The tale of two charts combined

…Or when two half chart are more than one, or maybe ‘The love between the streamgraph and stack column chart’… In this post we’ll review some ideas in the chart used by Financial Times in it’s Coronavirus tracker, and how we can replicate it in R using highcharter package.

data-visualization
highcharts
Author

Joshua Kunst Fuentes

Published

June 24, 2020

Post updated on Dec 23, 2022

Introducction

Week ago I see a tweet from Steven Bernard @sdbernard from Financial Times showing a streamgraph on the top of a stacked column chart. I take a look some seconds and then boom: What a combination! Why?

One of them is the complement of the other.

The link for the original source is here.

I like the streamgraph but it is hard to see the change the distribution between categories when the total change sudden. So have this auxiliar chart is a nice add to don’t loose from sigth the distribution.

Data

In this post we will use the Our Workd In Data Covid deaths (link here) because I’m not sure what is the data used by Financial Times team.

We’ll load the data and check the structure and get only what we need to replitcate the chart:

# A tibble: 7 × 2
  continent         n
  <chr>         <int>
1 Africa        55423
2 Asia          51877
3 Europe        55959
4 North America 36735
5 Oceania       17339
6 South America 13301
7 <NA>          13739
Rows: 230,634
Columns: 5
$ continent  <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "As…
$ iso_code   <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AF…
$ date       <date> 2020-02-24, 2020-02-25, 2020-02-26, 2020-02-27, 2020-02-28…
$ location   <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
$ new_deaths <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

FT used the 7-day rolling average in the chart so we’ll use the {RcppRoll} package to get that series for each contienent. Check the next code to see how the function roll_meanr works.

 [1] NA NA  2  3  4  5  6  7  8  9

Now we need to group the data to calculate the roll mean for every country/location and then filter to reduce some noise.

The chart show continent so we’ll group by date and continent.

Rows: 6,076
Columns: 3
Groups: date [1,013]
$ date       <date> 2020-03-15, 2020-03-15, 2020-03-15, 2020-03-15, 2020-03-15…
$ continent  <chr> "Africa", "Asia", "Europe", "North America", "Oceania", "So…
$ new_deaths <dbl> 0, 99, 277, 7, 0, 0, 0, 111, 335, 11, 0, 0, 0, 123, 400, 16…

The streamgraph

Before combine two charts we need to know how to get every chart independently. Let’s start with the main one:

A good start, but we can do it better. So, some considerations:

  • The yAxis don’t have a meaning in the streamgraph so we’ll remove it.
  • We can set endOnTick and startOnTick en yAxis to gain some extra vertical space.
  • Remove the vertical lines to get a more clear chart.
  • Get a better tooltip (table = TRUE).
  • In this case but we can try adding labels to each series instead of using legend, same as the FT chart.
  • This is not associate to the chart itself but what is representing: In the original FT chart some countries like UK, US are separated for their continent because are relevant, and then the color used is similar to their continent to get the visual association.

To separate the information for some coutries from theirs continent we’ll create a grp variable:

# A tibble: 13 × 3
   continent     grp                n
   <chr>         <chr>          <int>
 1 Africa        Africa         55192
 2 Asia          Asia           49667
 3 Asia          India           1012
 4 Europe        Europe         52914
 5 Europe        Russia          1013
 6 Europe        United Kingdom  1012
 7 North America Mexico          1012
 8 North America North America  34441
 9 North America United States   1012
10 Oceania       Oceania        17224
11 South America Brazil          1012
12 South America Chile           1013
13 South America South America  11112

Fun part #1: To the continent which have separated countries will add the "Rest of " to be specific this is no the total continent.

# A tibble: 13 × 3
   continent     grp                       n
   <chr>         <chr>                 <int>
 1 Africa        Africa                55192
 2 Asia          India                  1012
 3 Asia          Rest of Asia          49667
 4 Europe        Rest of Europe        52914
 5 Europe        Russia                 1013
 6 Europe        United Kingdom         1012
 7 North America Mexico                 1012
 8 North America Rest of North America 34441
 9 North America United States          1012
10 Oceania       Oceania               17224
11 South America Brazil                 1012
12 South America Chile                  1013
13 South America Rest of South America 11112

Fun part #2: We’ll use a specific color for each continent, and a brighten variation for the the separated countries. For this task the {shades} package offer the brightness function.

continent grp n aux continent_color fct grp_color continent_cln
Africa Africa 253732 TRUE #f1c40f 0.00 #F1C40F africa
Asia India 506072 FALSE #d35400 0.10 #EC5E00 asia
Asia Rest of Asia 971635 TRUE #d35400 0.00 #D35400 asia
Europe Russia 385089 FALSE #2980b9 0.05 #2C89C6 europe
Europe United Kingdom 209957 FALSE #2980b9 0.10 #2F92D2 europe
Europe Rest of Europe 1378193 TRUE #2980b9 0.00 #2980B9 europe
North America United States 1082770 FALSE #2c3e50 0.05 #33485D north_america
North America Mexico 320004 FALSE #2c3e50 0.10 #3A526A north_america
North America Rest of North America 126349 TRUE #2c3e50 0.00 #2C3E50 north_america
Oceania Oceania 20627 TRUE #7f8c8d 0.00 #7F8C8D oceania
South America Brazil 690756 FALSE #2ecc71 0.05 #31D978 south_america
South America Chile 49357 FALSE #2ecc71 0.10 #34E67F south_america
South America Rest of South America 569100 TRUE #2ecc71 0.00 #2ECC71 south_america

Then exctract some vectors:

Before continuing let’s see the original colors and the finishes obtained with the {shades} package.

Original palette

Colors considering variations

The colors and levels are ready so let’s regroup the data using this new grp variable:

Then plot the previous chart but now considering all the comments made before.