
Very basic task that I’m tired of looking up how to perform, so I’m posting this for personal reference.
Task: create a ggplot geom_bar plot with bars ordered by value, x axis labels rotated, and y axis formatted as percent.
As an example, look at the 20 most common female names from 2000 onward.
library(tidyverse)
library(babynames)
df <- babynames %>% 
  filter(
    year >= 2000,
    sex == "F"
  ) %>% 
  group_by(name) %>% 
  summarize(prop = mean(prop), .groups = "drop") %>% 
  slice_max(prop, n = 20)(In the past, I would have arranged by prop and then filtered for row_number() <= 20, but slice_max seems to be a more concise way to do it.)
When using geom_bar, default behavior is to count the number of each category. You have to set stat = "identity" to use the value itself.
A default geom_bar plot alphabetizes the categories and X axis labels often overlap.
df %>% 
  ggplot(aes(name, prop)) +
  geom_bar(stat = "identity")
The following code changes:
reorder: reorders the factor levels by another value; use the negative to order from highest to lowestscale_y_continuous: displays the proportion as a percentageaxis.text.x: adjust the x axis labels
df %>% 
  ggplot(aes(x = reorder(name, -prop), prop)) +
  geom_bar(stat = "identity", color = "black", fill = "grey") +
  labs(title = "Popular names", x = "", y = "") +
  scale_y_continuous(labels = scales::percent) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))