METAR analysis

Analyzing historical METAR weather data for flight conditions by month and time of day, including calculation of crosswinds for available runways.
Flight
R
Published

July 11, 2024

Overview1

After a Discovery Flight in September 2023, I decided to pursue flight training. My Class 3 medical was granted in March 2024 (initially deferred for a medical issue from 20+ years ago) and I’ve logged around 17 hours of dual instruction so far in a Cessna 172S.

One of the things about flight training is that you realize how very important weather is, especially when flying under VFR (visual flight rules). After having a few lessons cancelled for weather, I wondered whether I could apply data geekery to analyzing historical METAR (Meteorological Aerodrome Report) data around a given airport, by time of day and month?

  • looked at 20+ years of roughly hourly data, from 2000 to present, around 280,000 measurements
  • initially focused on visibility and cloud cover
  • after a few preflights in July under the blistering sun, added temperatures
  • after working on landings, calculated crosswinds on the the most favorable runway
  • now that the eventuality of soloing is being mentioned, added more stringent solo student pilot conditions (crosswind and gusts)

I wrapped it all up in my first ever Quarto dashboard (I’ve previously used flexdashboard with R markdown). Screenshot of the dashboard’s first page above.

The following are notes and some bits of code describing how I obtained, processed, and analyzed the data.

METAR data

METAR data was obtained from the Iowa Environmental Mesonet (IEM) archive of Automated Surface Observing System (ASOS) Network data. Apologies that I will not be providing full details on how to obtain the METAR data. 20+ years of data is 60+ MB for each airport, and I don’t want to contribute to hammering the source site with too many data requests. It should be pretty easy to figure out how to do it on your own though.

I initially considered writing regular expressions to parse the raw METAR text, but it’s only semi-structured, and the IEM had very nice tabular data already available which required much less preprocessing.

There’s a very nice data dictionary, with full details here. The most relevant data fields included:

  • time of reporting
  • wind direction, wind speed, wind gusts
  • visibility
  • cloud coverage + heights
  • air temperature

For each hourly report, determined the classification category:

Categories: ceilings and visibility

  • LIFR (low IFR): < 500 ft AGL and/or Vis < 1 statute mile
  • IFR: < 1,000 ft AGL and/or Vis < 3 miles
  • MVFR (marginal VFR): <= 3,000 ft AGL and/or Vis <= 5 miles
  • VFR: > 3,000 ft AGL and Vis > 5 miles

Some details:

  • obtained times in UTC (coordinated universal / GMT / Zulu) to avoid daylight savings time issues –> need time zone for each airport to convert to local time
  • ASOS wind directions are reported relative to true north while runway headings are relative to magnetic north –> need runway headings relative to true north

Airport information

Airport names, time zones, and runway information were obtained from AirNav.com.

The web page for each airport has all the information needed. I used the R tidyverse rvest package and regular expressions to scrape out the desired airport metadata, including:

  • airport name and identifier
  • available runways, including both true and magnetic headings
  • timezone text converted to tz usable in R datetime conversions
library(tidyverse)
library(rvest) # https://rvest.tidyverse.org/articles/rvest.html

airport <- "kowd"

page <- read_html(paste0("https://airnav.com/airport/", airport))
text <- page %>% html_text2 # extract entire page as just text

airport_name <- sub("AirNav: ", "", page %>% html_elements("title") %>% html_text2())

timezone <- str_extract(text, regex("(?<=Time zone: \t).*", ignore_case = TRUE))
tz <- case_when(
  timezone == "UTC -4 (UTC -5 during Standard Time)" ~ "US/Eastern",
  timezone == "UTC -5 (UTC -6 during Standard Time)" ~ "US/Central",
  timezone == "UTC -6 (UTC -7 during Standard Time)" ~ "US/Mountain",
  timezone == "UTC -7 (UTC -8 during Standard Time)" ~ "US/Pacific",
  timezone == "UTC -8 (UTC -9 during Standard Time)" ~ "US/Alaska",
  timezone == "UTC -10 (year round; does not observe DST)" ~ "US/Hawaii",
  .default = NA # Other US: Aleutian, Arizona, East-Indiana, Indiana-Starke, Michigan, Samoa
)

# Runways: match words following "runway" that include a forward slash
runways <- sort(unlist(
  str_extract_all(
    text,
    regex("(?<=\\brunway\\s)\\S*\\b/\\S*\\b", ignore_case = TRUE
  ))))

# Headings: match 3 digits followed by either magnetic or true
headings_df <- data.frame(
    source = unique((unlist(
      str_extract_all(
        text,
        regex("\\b\\d{3} (magnetic|true)\\b", ignore_case = TRUE
      )))))
  ) %>%
  separate(source, into = c("heading", "type"), sep = " ") %>%
  arrange(type, heading)

runway_headings_true <- headings_df %>%
  filter(type == "true") %>%
  pull(heading) %>%
  as.numeric

runway_headings_magnetic <- headings_df %>%
  filter(type == "magnetic") %>%
  pull(heading) %>%
  as.numeric

metadata <- list(
  airport = toupper(airport),
  airport_name = airport_name,
  runways = runways,
  runway_headings_true = runway_headings_true,
  runway_headings_magnetic = runway_headings_magnetic,
  timezone = timezone,
  tz = tz)

metadata
#> $airport
#> [1] "KOWD"
#> 
#> $airport_name
#> [1] "KOWD - Norwood Memorial Airport"
#> 
#> $runways
#> [1] "10/28" "17/35"
#> 
#> $runway_headings_true
#> [1]  89 155 269 335
#> 
#> $runway_headings_magnetic
#> [1] 104 170 284 350
#> 
#> $timezone
#> [1] "UTC -4 (UTC -5 during Standard Time)"
#> 
#> $tz
#> [1] "US/Eastern"

Lineplot of category by month and time

For the dashboard, I used plotly’s ggplotly to add hover functionality.

plot_category <- function(df, category_min = "VFR") {
  # categories: VFR, MVFR, IFR, LIFR, Training
  df %>%
    filter(!is.na(category)) %>% 
    mutate(acceptable = (category >= category_min)) %>% 
    group_by(month, hour) %>%
    summarize(percent = mean(acceptable), .groups = "drop") %>%
    ggplot(aes(hour, percent, color = month)) +
    geom_point() +
    geom_line() +
    coord_cartesian(ylim = c(0,1)) +
    scale_y_continuous(labels = scales::percent) +
    labs(title = "Percent of time VFR", x = "Local time", y = "") +
    guides(color = guide_legend(title = "")) +
    scale_color_viridis_d(option = "D", direction = -1) +
    scale_x_continuous(breaks = seq(0, 23, 2)) +
    theme_bw()
}

plot_category(df)

Crosswind calculations

Using the (true) wind directions from the METARs and the (true) runway headings from AirNav, assume all runways are available and do a little trigonometry to calculate the crosswind components for each runway, and keep the lowest one.

The assumption that all runways are available is not always valid. For example, a hotel was built near Boston Logan to eliminate the possibility of landings on Runway 14 and takeoffs on Runway 32. But good enough for rough analysis.

crosswind_fraction <- function(drct, headings) {
  # drct is a scalar direction in degrees, representing wind direction in TRUE degrees
  # headings is a vector of numeric headings in degrees
  # winds (drct) are given in TRUE heading but only to nearest 10 degrees
  # could round drct + heading DIFFERENCE to nearest 10
  return (min(abs(sin(2 * pi * (drct - headings)/360)))) # no rounding
}

df <- df %>% 
  # Do crosswind calculations
  mutate(
    drct = drct %% 360, # sets 360 back to 0
    drct2 = ifelse(sknt == 0, 0, drct) # temporary variable, if no wind, set drct to 0
  ) %>%
  rowwise() %>% # each row is its own group, so the mutate acts PER ROW
  mutate(
    crosswind = sknt * crosswind_fraction(drct2, runway_headings),
    crosswind = ifelse(!is.na(sknt) && is.na(drct2), sknt, crosswind)
    # if windspeed known but drct is NA, sknt is <= 6, and will pessimistically be considered the crosswind
  ) %>% 
  ungroup() %>% 
  select(-drct2) %>% 

Heatmap of category by month and time

plot_category_heatmap <- function(df, limits = NULL, category_min = "VFR") {
  # categories: VFR, MVFR, IFR, LIFR, Training
  df %>% 
    filter(!is.na(category)) %>% 
    mutate(acceptable = (category >= category_min)) %>% 
    group_by(month, hour) %>%
    summarize(percent = 100 * mean(acceptable), .groups = "drop") %>%
    ggplot(aes(hour, month, fill = percent)) + 
    geom_tile() +
    scale_fill_viridis_c(option = "H", limits = limits) +
    labs(title = "Percent of time VFR", x = "Local time", y = "") +
    guides(fill = guide_legend(title = "Percent")) +
    scale_x_continuous(breaks = seq(0, 23, 2))
}

plot_category_heatmap(df)

Radial plot of frequency of wind directions

plot_wind <- function(df) {
  df %>%
    mutate(drct = drct %% 360) %>%
    filter(
      sknt != 0,
      !is.na(drct)
    ) %>%
    ggplot(aes(drct)) +
    geom_histogram(bins = 36, color = "black", fill = "lightgrey") +
    coord_polar(start = 0) +
    theme_bw() +
    scale_x_continuous(breaks = seq(0, 360, by = 20)) +
    labs(x = "Wind direction frequency (true)", y = "") +
    theme(axis.text.y = element_blank())
}

plot_wind(df)

Grid plot of student solo conditions

The interactive HTML dashboard is nice, but a panel of plots is also easy to generate.

Local student solo requirements

  • Wind 17 Kts
  • Crosswind 7 Kts
  • Gusts 5 Kts
  • Ceilings 3,500 ft AGL
  • Vis 6 miles)
g_cat_training <- plot_category(df, category_min = "Training") + ggtitle("Student solo conditions (%)")
g_cat_training_heatmap <- plot_category_heatmap(df, category_min = "Training") + ggtitle("Student solo conditions (%)")
g_wind_heatmap <- plot_wind_heatmap(df %>% filter(category == "Training"))
g_temp_heatmap <- plot_temp_heatmap(df %>% filter(category == "Training"))

g_grid <- cowplot::plot_grid(
  plot_grid(g_cat_training, g_cat_training_heatmap),
  plot_grid(g_wind_heatmap, g_temp_heatmap),
  nrow = 2
)

g_grid

Footnotes

  1. Thumbnail image: “Night Thunderstorm” by C. Clark(NOAA)↩︎