Recreating the Datasaurus Dozen Using tweenr and ggplot2

If you haven’t seen it yet, there’s a great example of why it’s always important to visualize your data making its way around the Twitter-verse.

Despite looking very different, all of these datasets have the same summary statistics to two decimal places. You can download the datasets, get details about the project, and read the whole paper by Justin Matejka and George Fitzmaurice here. In this post, I’ll show how we can recreate the GIF from the above tweet using tweenr and gganimate.

Creating the plots

The first step is to read in the data. The data has three variables: the dataset name, x, and y. I’ll define dataset as a factor so that the datasets will appear in the correct order in the animation.

library(tidyverse)
library(forcats)

datasaurus <- read_table2("datafiles/DatasaurusDozen.tsv",
  col_names = TRUE, col_types = "cnn") %>%
  mutate(dataset = as_factor(dataset))
datasaurus
#> # A tibble: 1,846 x 3
#>    dataset       x       y
#>     <fctr>   <dbl>   <dbl>
#>  1    dino 55.3846 97.1795
#>  2    dino 51.5385 96.0256
#>  3    dino 46.1538 94.4872
#>  4    dino 42.8205 91.4103
#>  5    dino 40.7692 88.3333
#>  6    dino 38.7179 84.8718
#>  7    dino 35.6410 79.8718
#>  8    dino 33.0769 77.5641
#>  9    dino 28.9744 74.4872
#> 10    dino 26.1538 71.4103
#> # ... with 1,836 more rows

We can view all of the datasets at once using facet_wrap in ggplot2.

ggplot(datasaurus, aes(x = x, y = y)) +
  facet_wrap(~ dataset, nrow = 3) +
  geom_point()

Hard to believe all of these datasets have the same summary statistics!

Animating the plots

For a first pass at animating these datasets, I’ll use the gganimate package. This works just like ggplot code above, just with an added frame aesthetic and no facet_wrap.

library(gganimate)

p <- ggplot(datasaurus, aes(x = x, y = y)) +
  geom_point(aes(frame = dataset))

animation::ani.options(interval = 1)
gganimate(p, title_frame = FALSE)

This is close, but not quite what I was looking for. This does indeed animate all of the datasets, but in order to duplicate the GIF above, I really want to see the points moving into their new positions for each dataset. To get this effect, I’ll use the tweenr package. tweenr takes in a list of dataframes, and then interpolates the transitions between the states.

First, I’ll create a list of the datasets.

n_datasaurus <- datasaurus %>%
  group_by(dataset) %>%
  nest() %>%
  add_row(dataset = "dino", data = list(.$data[[1]]))
n_datasaurus
#> # A tibble: 14 x 2
#>       dataset               data
#>        <fctr>             <list>
#>  1       dino <tibble [142 x 2]>
#>  2       away <tibble [142 x 2]>
#>  3    h_lines <tibble [142 x 2]>
#>  4    v_lines <tibble [142 x 2]>
#>  5    x_shape <tibble [142 x 2]>
#>  6       star <tibble [142 x 2]>
#>  7 high_lines <tibble [142 x 2]>
#>  8       dots <tibble [142 x 2]>
#>  9     circle <tibble [142 x 2]>
#> 10   bullseye <tibble [142 x 2]>
#> 11   slant_up <tibble [142 x 2]>
#> 12 slant_down <tibble [142 x 2]>
#> 13 wide_lines <tibble [142 x 2]>
#> 14       dino <tibble [142 x 2]>

I’ve also added the dino dataset again at the bottom so that the GIF with start and end with that dataset, making it seamless. I’ll then use tween_states, sending it the list of dataframes, and specifying the length of each state and transitions (I had to play around a bit with the numbers until I was happy with the final animation).

tween_datasaurus <- tween_states(n_datasaurus$data, tweenlength = 1,
  statelength = 0.5, ease = "sine-out", nframe = 200) %>%
  as.tibble()
tween_datasaurus
#> # A tibble: 28,400 x 3
#>          x       y .frame
#>      <dbl>   <dbl>  <int>
#>  1 55.3846 97.1795      1
#>  2 51.5385 96.0256      1
#>  3 46.1538 94.4872      1
#>  4 42.8205 91.4103      1
#>  5 40.7692 88.3333      1
#>  6 38.7179 84.8718      1
#>  7 35.6410 79.8718      1
#>  8 33.0769 77.5641      1
#>  9 28.9744 74.4872      1
#> 10 26.1538 71.4103      1
#> # ... with 28,390 more rows

This creates a new dataframe with the added .frame variable. I can then use the same gganimate code from above, just specifying .frame as the frame aesthetic instead of dataset.

p <- ggplot(tween_datasaurus, aes(x = x, y = y)) +
  geom_point(aes(frame = .frame))

animation::ani.options(interval = 1 / 15)
gganimate(p, title_frame = FALSE)

And there you have it! Now we can see all of the points moving between each dataset!

Session info

devtools::session_info()
#>  setting  value                       
#>  version  R version 3.4.2 (2017-01-27)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2017-11-03                  
#> 
#>  package    * version    date       source                            
#>  animation  * 2.5        2017-10-27 Github (yihui/animation@8c76538)  
#>  assertthat   0.2.0      2017-04-11 cran (@0.2.0)                     
#>  backports    1.1.1      2017-09-25 cran (@1.1.1)                     
#>  base       * 3.4.2      2017-10-12 local                             
#>  bindr        0.1        2016-11-13 cran (@0.1)                       
#>  bindrcpp   * 0.2        2017-06-17 cran (@0.2)                       
#>  blogdown     0.2        2017-11-03 Github (rstudio/blogdown@3355851) 
#>  bookdown     0.5        2017-08-20 cran (@0.5)                       
#>  broom        0.4.2      2017-02-13 cran (@0.4.2)                     
#>  cellranger   1.1.0      2016-07-27 cran (@1.1.0)                     
#>  colorspace   1.3-2      2016-12-14 cran (@1.3-2)                     
#>  compiler     3.4.2      2017-10-12 local                             
#>  datasets   * 3.4.2      2017-10-12 local                             
#>  devtools     1.13.3     2017-08-02 CRAN (R 3.4.2)                    
#>  digest       0.6.12     2017-01-27 CRAN (R 3.4.2)                    
#>  dplyr      * 0.7.4      2017-09-28 cran (@0.7.4)                     
#>  evaluate     0.10.1     2017-06-24 cran (@0.10.1)                    
#>  forcats    * 0.2.0      2017-01-23 cran (@0.2.0)                     
#>  foreign      0.8-69     2017-06-22 CRAN (R 3.4.2)                    
#>  gganimate  * 0.1.0.9000 2017-10-27 Github (dgrtwo/gganimate@bf82002) 
#>  ggplot2    * 2.2.1      2016-12-30 cran (@2.2.1)                     
#>  glue         1.1.1      2017-06-21 cran (@1.1.1)                     
#>  graphics   * 3.4.2      2017-10-12 local                             
#>  grDevices  * 3.4.2      2017-10-12 local                             
#>  grid         3.4.2      2017-10-12 local                             
#>  gtable       0.2.0      2016-02-26 cran (@0.2.0)                     
#>  haven        1.1.0      2017-07-09 cran (@1.1.0)                     
#>  hms          0.3        2016-11-22 cran (@0.3)                       
#>  htmltools    0.3.6      2017-04-28 cran (@0.3.6)                     
#>  httr         1.3.1      2017-08-20 CRAN (R 3.4.2)                    
#>  jsonlite     1.5        2017-06-01 CRAN (R 3.4.2)                    
#>  knitr      * 1.17       2017-08-10 cran (@1.17)                      
#>  labeling     0.3        2014-08-23 cran (@0.3)                       
#>  lattice      0.20-35    2017-03-25 CRAN (R 3.4.2)                    
#>  lazyeval     0.2.0      2016-06-12 cran (@0.2.0)                     
#>  lubridate    1.6.0      2016-09-13 cran (@1.6.0)                     
#>  magrittr     1.5        2014-11-22 cran (@1.5)                       
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.4.2)                    
#>  methods    * 3.4.2      2017-10-12 local                             
#>  mnormt       1.5-5      2016-10-15 cran (@1.5-5)                     
#>  modelr       0.1.1      2017-07-24 cran (@0.1.1)                     
#>  munsell      0.4.3      2016-02-13 cran (@0.4.3)                     
#>  nlme         3.1-131    2017-02-06 CRAN (R 3.4.2)                    
#>  parallel     3.4.2      2017-10-12 local                             
#>  pkgconfig    2.0.1      2017-03-21 cran (@2.0.1)                     
#>  plyr         1.8.4      2016-06-08 cran (@1.8.4)                     
#>  psych        1.7.8      2017-09-09 cran (@1.7.8)                     
#>  purrr      * 0.2.4      2017-10-18 cran (@0.2.4)                     
#>  R6           2.2.2      2017-06-17 CRAN (R 3.4.2)                    
#>  Rcpp         0.12.13    2017-09-28 cran (@0.12.13)                   
#>  readr      * 1.1.1      2017-05-16 cran (@1.1.1)                     
#>  readxl       1.0.0      2017-04-18 cran (@1.0.0)                     
#>  reshape2     1.4.2      2016-10-22 cran (@1.4.2)                     
#>  rlang        0.1.2      2017-08-09 cran (@0.1.2)                     
#>  rmarkdown    1.6.0.9009 2017-11-03 Github (rstudio/rmarkdown@6e68143)
#>  rprojroot    1.2        2017-01-16 cran (@1.2)                       
#>  rvest        0.3.2      2016-06-17 cran (@0.3.2)                     
#>  scales       0.5.0      2017-08-24 cran (@0.5.0)                     
#>  stats      * 3.4.2      2017-10-12 local                             
#>  stringi      1.1.5      2017-04-07 cran (@1.1.5)                     
#>  stringr      1.2.0      2017-02-18 cran (@1.2.0)                     
#>  tibble     * 1.3.4      2017-08-22 cran (@1.3.4)                     
#>  tidyr      * 0.7.2      2017-10-16 cran (@0.7.2)                     
#>  tidyselect   0.2.2      2017-10-10 cran (@0.2.2)                     
#>  tidyverse  * 1.1.1      2017-01-27 cran (@1.1.1)                     
#>  tools        3.4.2      2017-10-12 local                             
#>  tweenr     * 0.1.5      2016-10-10 cran (@0.1.5)                     
#>  utils      * 3.4.2      2017-10-12 local                             
#>  withr        2.0.0      2017-07-28 CRAN (R 3.4.2)                    
#>  xml2         1.1.1      2017-01-24 cran (@1.1.1)                     
#>  yaml         2.1.14     2016-11-12 cran (@2.1.14)