
This post provides R code to estimate growth chart LMS parameters1 given a table of measurements and percentile values (or Z-scores). The initial rationale was to explore problems with LMS parameterization in a publication describing growth charts for Noonan syndrome2, with the findings uploaded to MedRxiv3.
Load packages
Show the code
library(tidyverse)
library(knitr)
library(minpack.lm)  # nlsLM() for robust nonlinear least squares
library(ggh4x)       # facet_grid2(), for multiple panels with free scalesLoad data
The source data was extracted from the supplemental materials from2 using Tabula and collected to a single datagrame.
Show the code
##### Load original published data tables #####
# `cappa_full.csv` contains: chart, age / units, gender, measure / units, M, S, L, 3 / 10 / 25 / 50 / 75 / 90 / 97 percentiles
df_orig <- read.csv("cappa_full.csv") |> as_tibble()
df_orig |> select(measure, gender, age, M, S, L, p_3:p_97) |> head() |> knitr::kable()| measure | gender | age | M | S | L | p_3 | p_10 | p_25 | p_50 | p_75 | p_90 | p_97 | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| height_velocity | f | 0.0 | 19.301 | 0.262 | 0.780 | 10.353 | 13.069 | 15.955 | 19.301 | 22.780 | 26.015 | 29.298 | 
| height_velocity | f | 0.2 | 18.601 | 0.264 | 0.781 | 9.923 | 12.557 | 15.355 | 18.601 | 21.976 | 25.114 | 28.298 | 
| height_velocity | f | 0.4 | 17.903 | 0.266 | 0.782 | 9.497 | 12.048 | 14.759 | 17.903 | 21.172 | 24.212 | 27.297 | 
| height_velocity | f | 0.6 | 17.207 | 0.267 | 0.784 | 9.077 | 11.544 | 14.166 | 17.207 | 20.370 | 23.311 | 26.296 | 
| height_velocity | f | 0.8 | 16.515 | 0.269 | 0.785 | 8.663 | 11.045 | 13.577 | 16.515 | 19.570 | 22.412 | 25.295 | 
| height_velocity | f | 1.0 | 15.827 | 0.271 | 0.786 | 8.254 | 10.551 | 12.994 | 15.828 | 18.775 | 21.516 | 24.297 | 
There were problems with the LMS parameters, so the goal was to re-estimate them from the percentile values (3 / 10 / 25 / 50 / 75 / 90 / 97). The LMS parameterization relates the measurement value X with the parameters L, M, S, and Z as follows:
or
Prepare training data
Show the code
df <- df_orig |> # wide to long format with published X at percentiles
  pivot_longer(
    p_3:p_97,
    names_to = "percentile",
    names_prefix = "p_",
    values_to = "X",
    names_transform = as.numeric
  ) |> 
  mutate(Z = qnorm(percentile / 100))
df_train <- df |>
  select(measure, gender, age, X, Z)
df_train |> head(n = 14) |> knitr::kable()| measure | gender | age | X | Z | 
|---|---|---|---|---|
| height_velocity | f | 0.0 | 10.353 | -1.8807936 | 
| height_velocity | f | 0.0 | 13.069 | -1.2815516 | 
| height_velocity | f | 0.0 | 15.955 | -0.6744898 | 
| height_velocity | f | 0.0 | 19.301 | 0.0000000 | 
| height_velocity | f | 0.0 | 22.780 | 0.6744898 | 
| height_velocity | f | 0.0 | 26.015 | 1.2815516 | 
| height_velocity | f | 0.0 | 29.298 | 1.8807936 | 
| height_velocity | f | 0.2 | 9.923 | -1.8807936 | 
| height_velocity | f | 0.2 | 12.557 | -1.2815516 | 
| height_velocity | f | 0.2 | 15.355 | -0.6744898 | 
| height_velocity | f | 0.2 | 18.601 | 0.0000000 | 
| height_velocity | f | 0.2 | 21.976 | 0.6744898 | 
| height_velocity | f | 0.2 | 25.114 | 1.2815516 | 
| height_velocity | f | 0.2 | 28.298 | 1.8807936 | 
Functions
The meat of the code is the estimate_LMS_parameters() function which takes a dataframe including columns of X and Z, optionally groups (for example, by anthropometric measurement, sex, and age), and performs non-linear least squares regression via minpack.lm::nlsLM() to estimate L, M, and S for each group.
The estimation function could be improved to more explicitly handle the case where L = 0, as the objective function converges to a different form as L approaches 0, but setting starting values for L that include values just above and below 0 seems to work. Because we are trying to estimate 3 parameters, at least 3 points per group are needed. In this dataset, we have 7 X / Z points per group.
##### Estimate_LMS_parameters #####
estimate_LMS_parameters <- function(df, ...) {
  # takes a dataframe of columns Z and X and optional (...) grouping variables
  # returns each group's estimated LMS parameters, as well as RMSE, R-squared,
  # convergence, n, and method comment
  #
  # Requires: `tidyverse` and `minpack.lm` libraries
  
  library(tidyverse)
  library(minpack.lm)  # nlsLM() for robust nonlinear least squares
  tolerance <- 1e-6 # used for L ≈ 0 and for when assessing `best_obj`
  rmse_tolerance <- 0.01 # max RMSE for regression to be considered "converged"
  
  # Vectorized model function for optimization
  LMS_model_vec <- function(params, Z) {
    L <- params[1]
    M <- params[2] 
    S <- params[3]
    if (abs(L) < tolerance) { # When L ≈ 0, use exponential form
      return(M * exp(S * Z))
    } else { # Normal case
      return(M * (1 + L*S*Z)^(1/L))
    }
  }
  
  # Function to estimate parameters for a single group
  estimate_single_group <- function(group_data) {
    Z <- group_data$Z
    X <- group_data$X
    n <- nrow(group_data)
    
    if (n < 3) { # Need at least 3 points to estimate 3 parameters
      return(data.frame(
        L = NA, M = NA, S = NA, rmse = NA,
        r_squared = NA, converged = FALSE,
        n_obs = n, method = "insufficient_data"))
    }
    # Test discontinuity at L = 0 (normal distribution with no skew)
    #
    # tibble(
    #   measure = "test", age = 0, gender = "m",
    #   percentile = c(3, 10, 50, 90, 97)
    # ) |>
    #   mutate(
    #     Z = qnorm(percentile / 100),
    #     X = 50 * exp(1 * Z) # formula for L = 0; M = 50, S = 1
    #   ) |> 
    #   estimate_LMS_parameters(measure, age, gender)
    #
    # L_starts <- c(-5, 5)              # fails to converge
    # L_starts <- c(-5, -0.01, 0.01, 5) # converges to L = 1.165e-06
    
    # Sets of starting values
    M_starts <- c(mean(X), median(X)) # estimating the median
    L_starts <- c(-5, -0.01, 0.01, 5) # handle potential L = 0
    S_starts <- c(0.01, 1)            # always positive
    
    best_params <- NULL
    best_obj <- Inf
    best_method <- "failed"
    
    # Try different starting combinations
    for (M_start in M_starts) {
      for (L_start in L_starts) {
        for (S_start in S_starts) {
          tryCatch({ # Using minpack.lm::nlsLM() robust nonlinear least squares
            # objective function NOT normalized to # of points
            # RMSE = sqrt(obj_va / n)
            if (is.null(best_params) || best_obj > tolerance) {
              fit <- nlsLM(
                # X ~ ifelse( # this didn't work
                #   abs(L) < tolerance,
                #   M * exp(S * Z),
                #   M * (1 + L * S * Z)^(1/L)
                # ),
                X ~ M * (1 + L * S * Z)^(1/L), # will fail if L is exactly 0
                data = data.frame(X = X, Z = Z),
                start = list(L = L_start, M = M_start, S = S_start),
                control = nls.lm.control(maxiter = 1000))
              params <- coef(fit)
              obj_val <- sum(residuals(fit)^2)
              if (obj_val < best_obj) {
                best_obj <- obj_val
                best_params <- params
                best_method <- "nlsLM"
              }
            }
          }, error = function(e) {
            # Unsuccessful fitting --> continue
          })
        }
      }
    }
    
    # Check if found a solution
    if (is.null(best_params)) {
      return(data.frame(
        L = NA, M = NA, S = NA,
        rmse = NA, r_squared = NA, converged = FALSE,
        n_obs = n, method = "all_failed"))
    }
    
    # Calculate fit statistics
    predicted <- LMS_model_vec(best_params, Z)
    residuals <- X - predicted
    rmse <- sqrt(mean(residuals^2))
    
    # R-squared
    tss <- sum((X - mean(X))^2)
    rss <- sum(residuals^2)
    r_squared <- 1 - rss/tss
    
    # Check convergence quality
    converged <- (rmse < rmse_tolerance) && # OK criteria
      all(is.finite(best_params)) &&        # pretty unrestrictive criteria
      (r_squared > 0)                       # pretty unrestrictive criteria
    
    return(data.frame(
      L = best_params[1],
      M = best_params[2], 
      S = best_params[3],
      rmse = rmse,
      r_squared = r_squared,
      converged = converged,
      n_obs = n,
      method = best_method
    ))
  }
  
  # Apply to each group
  grouping_syms <- rlang::ensyms(...)
  results <- df
  if (length(grouping_syms) > 0) {
    results <- results |> group_by(!!!grouping_syms)
  }
  results <- results |> 
    group_modify(~ estimate_single_group(.x)) %>%
    ungroup()
  return(results)
}(Also included is a vectorized helper function from the peditools package to calculate X from L, M, S, and Z, which could also probably be improved by not using strict equality of L with 0.)
Show the code
##### z_lms_to_x #####
# From `peditools`
z_lms_to_x <- function (Z, L, M, S) {
  expand.arguments <- function(...) {
    dotList <- list(...)
    max.length <- max(sapply(dotList, length))
    lapply(dotList, rep, length = max.length)
  }
  temp <- expand.arguments(Z, L, M, S)
  Z = temp[[1]]
  L = temp[[2]]
  M = temp[[3]]
  S = temp[[4]]
  ifelse(L != 0, M * (1 + L * S * Z)^(1/L), M * exp(S * Z))
}Use the training data to estimate LMS parameters
The full dataframe df_train includes 4 anthropometric measures, 2 genders, and 100 ages, so LMS parameters will be estimated from 800 separate groups of seven X / Z pairs (one for each of the 7 percentiles (3 / 10 / 25 / 50 / 75 / 90 / 97).
Show the code
df_est <- estimate_LMS_parameters(df_train, measure, gender, age) # group by measure, gender, and age
df_est |> head() |> knitr::kable()| measure | gender | age | L | M | S | rmse | r_squared | converged | n_obs | method | 
|---|---|---|---|---|---|---|---|---|---|---|
| bmi | f | 0.0 | 0.0204592 | 14.14973 | 0.0923936 | 0.0001710 | 1 | TRUE | 7 | nlsLM | 
| bmi | f | 0.2 | 0.0124989 | 14.33114 | 0.0928675 | 0.0001556 | 1 | TRUE | 7 | nlsLM | 
| bmi | f | 0.4 | 0.0053028 | 14.50763 | 0.0933272 | 0.0002511 | 1 | TRUE | 7 | nlsLM | 
| bmi | f | 0.6 | 0.0007589 | 14.67681 | 0.0937863 | 0.0002292 | 1 | TRUE | 7 | nlsLM | 
| bmi | f | 0.8 | -0.0076834 | 14.83683 | 0.0942656 | 0.0001313 | 1 | TRUE | 7 | nlsLM | 
| bmi | f | 1.0 | -0.0158573 | 14.98664 | 0.0947315 | 0.0002322 | 1 | TRUE | 7 | nlsLM | 
The LMS estimation results include some per-group statistics (like RMSE, R-squared, and number of observations). The RMSE values are quite good but not perfect, as they should be if the percentile values were all calculated correctly from source LMS values.This imperfect convergence reflects problems with the source data.
Show the code
summary(df_est |> mutate(measure = as.factor(measure), gender = as.factor(gender), method = as.factor(method)))            measure    gender       age              L          
 bmi            :200   f:400   Min.   : 0.00   Min.   :-3.8136  
 height         :200   m:400   1st Qu.: 4.95   1st Qu.:-1.1473  
 height_velocity:200           Median : 9.90   Median :-0.1807  
 weight         :200           Mean   : 9.90   Mean   :-0.3799  
                               3rd Qu.:14.85   3rd Qu.: 0.6690  
                               Max.   :19.80   Max.   : 2.6379  
       M                  S                rmse             r_squared     
 Min.   :  0.8091   Min.   :0.04280   Min.   :3.656e-05   Min.   :0.9996  
 1st Qu.: 11.2486   1st Qu.:0.09131   1st Qu.:1.768e-04   1st Qu.:1.0000  
 Median : 17.4175   Median :0.14562   Median :2.264e-04   Median :1.0000  
 Mean   : 42.7057   Mean   :0.17739   Mean   :4.009e-03   Mean   :1.0000  
 3rd Qu.: 54.2090   3rd Qu.:0.22781   3rd Qu.:2.879e-04   3rd Qu.:1.0000  
 Max.   :166.2587   Max.   :0.48378   Max.   :1.325e-01   Max.   :1.0000  
 converged           n_obs     method   
 Mode :logical   Min.   :7   nlsLM:800  
 FALSE:53        1st Qu.:7              
 TRUE :747       Median :7              
                 Mean   :7              
                 3rd Qu.:7              
                 Max.   :7              
Compare reported X values with those calculated from re-estimated LMS values
The newly estimated LMS values were used to recalculate the X values. (In the technical brief, the problems exhibited by the published LMS values were also shown.)
Show the code
# Functionally assess estimatedLMS values
df_assess <- df_est |>
  transmute(measure, gender, age, estL = L, estM = M, estS = S) |>
  left_join(
    df_orig |> select(measure, gender, age, L, M, S, everything()),
    by = c("measure", "gender", "age")
  ) |>
  mutate(
    deltaL = estL - L, deltaM = estM - M, deltaS = estS - S
  ) |> 
  pivot_longer(p_3:p_97, names_to = "percentile", names_prefix = "p_", values_to = "X") |>
  mutate(
    # Calculate X from original and re-estimated LMS values
    percentile = as.numeric(percentile),
    calcX = z_lms_to_x(qnorm(percentile / 100.0), L, M, S), # using the originally published LMS values
    estX  = z_lms_to_x(qnorm(percentile / 100.0), estL, estM, estS)
  ) |> 
  select(
    chart, age, age_units, gender, measure, measure_units, L, M, S, estL, estM, estS, deltaL, deltaM, deltaS, percentile, X, calcX, estX
  )Overlaying the reported X values with those calculated from the re-estimated LMS values show near-perfect alignment. (Each panel is reproduced in larger size below, to better visualize the minor imperfections present from the flawed original percentile measurement data.)
Show the code
# Plot X, calcX, and estX
g <- df_assess |>
  mutate(percentile = factor(percentile, ordered = TRUE)) |>
  ggplot(aes(age, X, group = percentile)) +
  geom_line(size = 0.2) +
  # geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
  geom_point(aes(age, estX), color = "blue", size = 0.3, alpha = 0.2) +
  facet_grid2(gender ~ measure, scales = "free_y", independent = "y") +
  theme_bw() +
  labs(
    x = "Age (years)",
    y = "X measurement"
  )
g
So, that’s how to regenerate LMS parameters from tables of percentile values. Theoretically, if the table of percentiles were correctly generated directly from LMS values, 3 values per group would be sufficient to perfectly recreate the LMS parameters. In that case, it’d probably best to use 10 / 50 / 90 percentile values, to avoid the decreased precision at more extreme Z scores.
Individual plots of reported vs calculated X values from re-estimated LMS
Show the code
for (m in c("bmi", "height", "height_velocity", "weight")) {
  {
  # for (g in c("f", "m")) {
    g <- df_assess |> 
      # filter(measure == m, gender == g) |> 
      filter(measure == m) |> 
      mutate(percentile = factor(percentile, ordered = TRUE)) |>
      ggplot(aes(age, X, group = percentile)) +
      geom_line() +
      # geom_point(aes(age, calcX), color = "red", size = 0.4, alpha = 0.5) +
      geom_point(aes(age, estX), color = "blue", size = 0.4, alpha = 0.5) +
      theme_bw() +
      labs(
        x = "Age (years)",
        y = "X measurement"
      ) +
      labs(
        # title = paste0(m, " - ", g)
        title = paste0(m)
      ) +
      facet_wrap(~gender)
    print(g)
  }
}


