What is ggplot2?

ggplot2 is a data visualization package written by Hadley Wickham that uses the “grammar of graphics.” The grammar of graphics provides a consistent way to describe the components of graph, allowing us to move beyond specific types of plots (e.g., boxplot, scatterplot, etc.) to different elements that compose the plot. As the name would imply, the grammar of graphics is a language we can use to describe and build visualizations.

Today we’ll be using data on US breweries (yay beer!) to explore some of ggplot2’s capabilities. First, we will install the packages we will need.

# install.packages("devtools")
# devtools::install_github("hadley/ggplot2")
# devtools::install_github("hadley/dplyr")
# devtools::install_github("hadley/purrr")
# devtools::install_github("hadley/tidyr")
# devtools::install_github("hadley/forcats")
# devtools::install_github("hadley/readr")
# devtools::install_github("dgrtwo/gganimate")
# install.packages("maps")

library(ggplot2)
library(dplyr)
library(purrr)
library(tidyr)
library(forcats)
library(readr)
library(gganimate)
library(maps)

The dataset

The dataset contains the information on breweries across the United States scraped from beer advocate. Information on the breweries includes the brewery name, brewery rating, the number of reviews, the average rating of their beers, the number of beers they serve, and location information.

all_breweries <- read_csv("all_breweries.csv", col_types = "cnnnnccccnncnn")
all_breweries
#> # A tibble: 5,686 × 14
#>                  brewery_name brewery_rating num_reviews beer_avg
#>                         <chr>          <dbl>       <dbl>    <dbl>
#> 1                 603 Brewery             NA          NA     3.75
#> 2      7th Settlement Brewery           4.23          24     3.61
#> 3  Agner & Wolf Brewery Corp.             NA          NA     3.61
#> 4    Ashuelot Brewing Company             NA          NA       NA
#> 5            Bad Lab Beer Co.             NA          NA       NA
#> 6     Beara Irish Brewing Co.           4.12           5     3.60
#> 7        Belgian Mare Brewery             NA          NA     3.39
#> 8           Big Water Brewery             NA          NA     3.17
#> 9  Blackstone Brewing Company             NA          NA       NA
#> 10         Border Brew Supply           3.85          10     3.39
#> # ... with 5,676 more rows, and 10 more variables: num_beers <dbl>,
#> #   address <chr>, type <chr>, city <chr>, state <chr>, lon <dbl>,
#> #   lat <dbl>, full_city <chr>, city_lon <dbl>, city_lat <dbl>

Making a plot

Let’s look at the relationship between the brewery’s overall rating and the number of beers they serve.

brew_plot <- all_breweries %>%
  filter(!is.na(brewery_rating), !is.na(num_beers),
    state %in% c("Kansas", "Oklahoma", "Missouri", "Missouri", "Iowa",
      "Nebraska", "Colorado"),
    num_beers < 400)

ggplot(data = brew_plot) +
  geom_point(mapping = aes(x = num_beers, y = brewery_rating))

ggplot() initializes a blank plot, and then layers (geoms) are added to complete the plot. For example, geom_point() adds points to create a scatterplot. In the geom call, the user specifies which variable map to the x- and y-axes. We can create a general form all ggplot2 graphics:

ggplot(data = <DATA>) +
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

Aesthetic mappings

ggplot(data = brew_plot) +
  geom_point(mapping = aes(x = num_beers, y = brewery_rating, color = type))