Very basic task that I’m tired of looking up how to perform, so I’m posting this for personal reference.
Task: create a ggplot geom_bar plot with bars ordered by value, x axis labels rotated, and y axis formatted as percent.
As an example, look at the 20 most common female names from 2000 onward.
library(tidyverse)
library(babynames)
<- babynames %>%
df filter(
>= 2000,
year == "F"
sex %>%
) group_by(name) %>%
summarize(prop = mean(prop), .groups = "drop") %>%
slice_max(prop, n = 20)
(In the past, I would have arranged by prop
and then filtered for row_number() <= 20
, but slice_max
seems to be a more concise way to do it.)
When using geom_bar
, default behavior is to count the number of each category. You have to set stat = "identity"
to use the value itself.
A default geom_bar
plot alphabetizes the categories and X axis labels often overlap.
%>%
df ggplot(aes(name, prop)) +
geom_bar(stat = "identity")
The following code changes:
reorder
: reorders the factor levels by another value; use the negative to order from highest to lowestscale_y_continuous
: displays the proportion as a percentageaxis.text.x
: adjust the x axis labels
%>%
df ggplot(aes(x = reorder(name, -prop), prop)) +
geom_bar(stat = "identity", color = "black", fill = "grey") +
labs(title = "Popular names", x = "", y = "") +
scale_y_continuous(labels = scales::percent) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))