Interpolating missing values

Linear interpolation of missing dependent values based on an independent variable, not permitting extrapolation
R
Code snippet
Published

July 10, 2024

Linear interpolation of missing dependent values based on an independent variable Only interpolation permitted – values requiring extrapolation would remain NA.

One use case is data collected from multiple patients (IDs), with a column of ages, and a column of weights where some weights are missing (NA). For each ID, weights will be linearly interpolated based on the ages.

library(dplyr)

# Function to linearly interpolate missing values
interpolate_missing <- function(x, y) {
  na_inds <- which(is.na(y))
  if (length(na_inds) > 0) {
    approx_x <- x[!is.na(y)]
    approx_y <- y[!is.na(y)]
    y[na_inds] <- approx(approx_x, approx_y, xout = x[na_inds])$y
  }
  return(y)
}

Example

Create a dataframe with 3 columns:

  • id
  • independent value x
  • dependent value y with some values missing

I intentionally have some rows of the independent variable out of order and included one missing dependent value that would need extrapolation.

df <- data.frame( # Example dataframe
  id = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3),
  x = c(4, 2, 1, 5, 1, 3, 5, 1, 2, 3),
  y = c(20, NA, 10, NA, 5, NA, 15, 4, 5, 6)
)

df %>% knitr::kable()
id x y
1 4 20
1 2 NA
1 1 10
1 5 NA
2 1 5
2 3 NA
2 5 15
3 1 4
3 2 5
3 3 6
df %>% # Group by id and interpolate missing y values based on surrounding x and y values
  group_by(id) %>%
  mutate(y_interpolate = round(interpolate_missing(x, y), 2)) %>% 
  arrange(id, x) %>% 
  knitr::kable()
id x y y_interpolate
1 1 10 10.00
1 2 NA 13.33
1 4 20 20.00
1 5 NA NA
2 1 5 5.00
2 3 NA 10.00
2 5 15 15.00
3 1 4 4.00
3 2 5 5.00
3 3 6 6.00