I have a fragile pipeline to generate monthly clinical dashboards that fails regularly in fairly typical ways. However, this month it failed in a completely new and unexpected way, resulting in a period of mild panic. TL;DR – add folder to Windows Path environment variable.
The pipeline generates dashboards by:
- having the Windows equivalent of a
cron
job run an R script on my office Windows workstation via the command line (WindowsTask Scheduler
), - which then executes an SQL query (
RODBC
) against our Enterprise Data Warehouse (EDW), - pre-processes the raw query results and generates analysis summaries and plots (
tidyverse
) - knits it into an interactive HTML dashboard (
flexdashboard
), - and then securely emails it to various clinicians (I don’t even want to get into how this happens. It will probably fail soon.)
The usual failure modes include:
- hospital IT restarting my office workstation remotely (happens on a somewhat unpredictable schedule)
- or, the database queries fail because of a timeout (depending on time of day the enterprise MS SQL server gets overwhelmed)
- or rarely, EDW database tables change without notice (hospital name changes for re-branding, or vocabulary changes because of social pressures)
This time, checking the log files gave a new and exciting failure message on every single dashboard…
pandoc version 1.12.3 or higher is required and was not found
From the hospital Windows workstation within the RStudio IDE, all seemed good:
> rmarkdown::pandoc_available() [1] TRUE > rmarkdown::pandoc_version() [1] ‘2.18’
But then, trying to figure out where pandoc
was installed gave a hint:
> rmarkdown::pandoc_exec() [1] "C:/Program Files/RStudio/bin/quarto/bin/tools/pandoc"
Uh oh, quarto
? I had recently upgraded RStudio to be able to start learning about quarto
. Since my dashboard pipeline executes an R script via the command line, it probably had no idea about this quarto
location. Google yielded a confirmation of this likely being the issue, and some suggested solutions.
One suggestion was make a duplicate of the pandoc
installation to the old location, but the path was for a Unix installation.
A better solution seemed to be to add the new path to the Windows environment path variables. One problem is that I’m far more familiar with Mac OS and can navigate around the terminal with Mac OS (and Unix I guess) fairly comfortably, but am nearly completely ignorant about Windows. (Another is that hospital IT has made it more difficult to use admin privileges on our own private hospital workstations, but hospital IT has given some workaround, called PLAS
.) This solution gave nicely detailed instructions for how to add the pandoc
folder to the Path environment variable in Windows:
Add the pandoc folder to your Path environment variable: 1) Press Windows key and type 'environment variables'. Hit Enter. This should take you to a window titled 'Environment Variables' 2) Select 'Path' and click 'Edit...' 3) You should see a list of paths. Click 'New' and enter C:\Program Files\RStudio\bin\quarto\bin\tools\. It should be at the bottom of the list 4) Click 'Okay'
Yes, this is a long-winded way of saying how to add a new path in Windows so that command line R can find pandoc
(when it works fine from the RStudio IDE). But the actual purpose of this post is so that future me can find this solution, because I suspect it’ll come up again.
Footnotes
Photo by Nicola Barts↩︎