library(tidyverse)
library(rvest) # https://rvest.tidyverse.org/articles/rvest.html
<- "kowd"
airport
<- read_html(paste0("https://airnav.com/airport/", airport))
page <- page %>% html_text2 # extract entire page as just text
text
<- sub("AirNav: ", "", page %>% html_elements("title") %>% html_text2())
airport_name
<- str_extract(text, regex("(?<=Time zone: \t).*", ignore_case = TRUE))
timezone <- case_when(
tz == "UTC -4 (UTC -5 during Standard Time)" ~ "US/Eastern",
timezone == "UTC -5 (UTC -6 during Standard Time)" ~ "US/Central",
timezone == "UTC -6 (UTC -7 during Standard Time)" ~ "US/Mountain",
timezone == "UTC -7 (UTC -8 during Standard Time)" ~ "US/Pacific",
timezone == "UTC -8 (UTC -9 during Standard Time)" ~ "US/Alaska",
timezone == "UTC -10 (year round; does not observe DST)" ~ "US/Hawaii",
timezone .default = NA # Other US: Aleutian, Arizona, East-Indiana, Indiana-Starke, Michigan, Samoa
)
# Runways: match words following "runway" that include a forward slash
<- sort(unlist(
runways str_extract_all(
text,regex("(?<=\\brunway\\s)\\S*\\b/\\S*\\b", ignore_case = TRUE
))))
# Headings: match 3 digits followed by either magnetic or true
<- data.frame(
headings_df source = unique((unlist(
str_extract_all(
text,regex("\\b\\d{3} (magnetic|true)\\b", ignore_case = TRUE
)))))%>%
) separate(source, into = c("heading", "type"), sep = " ") %>%
arrange(type, heading)
<- headings_df %>%
runway_headings_true filter(type == "true") %>%
pull(heading) %>%
as.numeric
<- headings_df %>%
runway_headings_magnetic filter(type == "magnetic") %>%
pull(heading) %>%
as.numeric
<- list(
metadata airport = toupper(airport),
airport_name = airport_name,
runways = runways,
runway_headings_true = runway_headings_true,
runway_headings_magnetic = runway_headings_magnetic,
timezone = timezone,
tz = tz)
metadata#> $airport
#> [1] "KOWD"
#>
#> $airport_name
#> [1] "KOWD - Norwood Memorial Airport"
#>
#> $runways
#> [1] "10/28" "17/35"
#>
#> $runway_headings_true
#> [1] 89 155 269 335
#>
#> $runway_headings_magnetic
#> [1] 104 170 284 350
#>
#> $timezone
#> [1] "UTC -4 (UTC -5 during Standard Time)"
#>
#> $tz
#> [1] "US/Eastern"
Overview1
After a Discovery Flight in September 2023, I decided to pursue flight training. My Class 3 medical was granted in March 2024 (initially deferred for a medical issue from 20+ years ago) and I’ve logged around 17 hours of dual instruction so far in a Cessna 172S.
One of the things about flight training is that you realize how very important weather is, especially when flying under VFR (visual flight rules). After having a few lessons cancelled for weather, I wondered whether I could apply data geekery to analyzing historical METAR (Meteorological Aerodrome Report) data around a given airport, by time of day and month?
- looked at 20+ years of roughly hourly data, from 2000 to present, around 280,000 measurements
- initially focused on visibility and cloud cover
- after a few preflights in July under the blistering sun, added temperatures
- after working on landings, calculated crosswinds on the the most favorable runway
- now that the eventuality of soloing is being mentioned, added more stringent solo student pilot conditions (crosswind and gusts)
I wrapped it all up in my first ever Quarto dashboard (I’ve previously used flexdashboard
with R markdown). Screenshot of the dashboard’s first page above.
The following are notes and some bits of code describing how I obtained, processed, and analyzed the data.
METAR data
METAR data was obtained from the Iowa Environmental Mesonet (IEM) archive of Automated Surface Observing System (ASOS) Network data. Apologies that I will not be providing full details on how to obtain the METAR data. 20+ years of data is 60+ MB for each airport, and I don’t want to contribute to hammering the source site with too many data requests. It should be pretty easy to figure out how to do it on your own though.
I initially considered writing regular expressions to parse the raw METAR text, but it’s only semi-structured, and the IEM had very nice tabular data already available which required much less preprocessing.
There’s a very nice data dictionary, with full details here. The most relevant data fields included:
- time of reporting
- wind direction, wind speed, wind gusts
- visibility
- cloud coverage + heights
- air temperature
For each hourly report, determined the classification category:
Categories: ceilings and visibility
- LIFR (low IFR): < 500 ft AGL and/or Vis < 1 statute mile
- IFR: < 1,000 ft AGL and/or Vis < 3 miles
- MVFR (marginal VFR): <= 3,000 ft AGL and/or Vis <= 5 miles
- VFR: > 3,000 ft AGL and Vis > 5 miles
Some details:
- obtained times in UTC (coordinated universal / GMT / Zulu) to avoid daylight savings time issues –> need time zone for each airport to convert to local time
- ASOS wind directions are reported relative to true north while runway headings are relative to magnetic north –> need runway headings relative to true north
Airport information
Airport names, time zones, and runway information were obtained from AirNav.com.
The web page for each airport has all the information needed. I used the R tidyverse rvest
package and regular expressions to scrape out the desired airport metadata, including:
- airport name and identifier
- available runways, including both true and magnetic headings
- timezone text converted to tz usable in R datetime conversions
Lineplot of category by month and time
For the dashboard, I used plotly
’s ggplotly
to add hover functionality.
<- function(df, category_min = "VFR") {
plot_category # categories: VFR, MVFR, IFR, LIFR, Training
%>%
df filter(!is.na(category)) %>%
mutate(acceptable = (category >= category_min)) %>%
group_by(month, hour) %>%
summarize(percent = mean(acceptable), .groups = "drop") %>%
ggplot(aes(hour, percent, color = month)) +
geom_point() +
geom_line() +
coord_cartesian(ylim = c(0,1)) +
scale_y_continuous(labels = scales::percent) +
labs(title = "Percent of time VFR", x = "Local time", y = "") +
guides(color = guide_legend(title = "")) +
scale_color_viridis_d(option = "D", direction = -1) +
scale_x_continuous(breaks = seq(0, 23, 2)) +
theme_bw()
}
plot_category(df)
Crosswind calculations
Using the (true) wind directions from the METARs and the (true) runway headings from AirNav, assume all runways are available and do a little trigonometry to calculate the crosswind components for each runway, and keep the lowest one.
The assumption that all runways are available is not always valid. For example, a hotel was built near Boston Logan to eliminate the possibility of landings on Runway 14 and takeoffs on Runway 32. But good enough for rough analysis.
<- function(drct, headings) {
crosswind_fraction # drct is a scalar direction in degrees, representing wind direction in TRUE degrees
# headings is a vector of numeric headings in degrees
# winds (drct) are given in TRUE heading but only to nearest 10 degrees
# could round drct + heading DIFFERENCE to nearest 10
return (min(abs(sin(2 * pi * (drct - headings)/360)))) # no rounding
}
<- df %>%
df # Do crosswind calculations
mutate(
drct = drct %% 360, # sets 360 back to 0
drct2 = ifelse(sknt == 0, 0, drct) # temporary variable, if no wind, set drct to 0
%>%
) rowwise() %>% # each row is its own group, so the mutate acts PER ROW
mutate(
crosswind = sknt * crosswind_fraction(drct2, runway_headings),
crosswind = ifelse(!is.na(sknt) && is.na(drct2), sknt, crosswind)
# if windspeed known but drct is NA, sknt is <= 6, and will pessimistically be considered the crosswind
%>%
) ungroup() %>%
select(-drct2) %>%
Heatmap of category by month and time
<- function(df, limits = NULL, category_min = "VFR") {
plot_category_heatmap # categories: VFR, MVFR, IFR, LIFR, Training
%>%
df filter(!is.na(category)) %>%
mutate(acceptable = (category >= category_min)) %>%
group_by(month, hour) %>%
summarize(percent = 100 * mean(acceptable), .groups = "drop") %>%
ggplot(aes(hour, month, fill = percent)) +
geom_tile() +
scale_fill_viridis_c(option = "H", limits = limits) +
labs(title = "Percent of time VFR", x = "Local time", y = "") +
guides(fill = guide_legend(title = "Percent")) +
scale_x_continuous(breaks = seq(0, 23, 2))
}
plot_category_heatmap(df)
Radial plot of frequency of wind directions
<- function(df) {
plot_wind %>%
df mutate(drct = drct %% 360) %>%
filter(
!= 0,
sknt !is.na(drct)
%>%
) ggplot(aes(drct)) +
geom_histogram(bins = 36, color = "black", fill = "lightgrey") +
coord_polar(start = 0) +
theme_bw() +
scale_x_continuous(breaks = seq(0, 360, by = 20)) +
labs(x = "Wind direction frequency (true)", y = "") +
theme(axis.text.y = element_blank())
}
plot_wind(df)
Grid plot of student solo conditions
The interactive HTML dashboard is nice, but a panel of plots is also easy to generate.
Local student solo requirements
- Wind 17 Kts
- Crosswind 7 Kts
- Gusts 5 Kts
- Ceilings 3,500 ft AGL
- Vis 6 miles)
<- plot_category(df, category_min = "Training") + ggtitle("Student solo conditions (%)")
g_cat_training <- plot_category_heatmap(df, category_min = "Training") + ggtitle("Student solo conditions (%)")
g_cat_training_heatmap <- plot_wind_heatmap(df %>% filter(category == "Training"))
g_wind_heatmap <- plot_temp_heatmap(df %>% filter(category == "Training"))
g_temp_heatmap
<- cowplot::plot_grid(
g_grid plot_grid(g_cat_training, g_cat_training_heatmap),
plot_grid(g_wind_heatmap, g_temp_heatmap),
nrow = 2
)
g_grid
Footnotes
Thumbnail image: “Night Thunderstorm” by C. Clark(NOAA)↩︎