Skip to contents

In this example, we start with a data frame with two columns, one with full dates and one with partial dates. The goal is to consolidate these dates into one ISO 8601 formatted date column.

library(sdtmval)
library(dplyr)

raw_dates <- data.frame(
  raw_full = c(
    rep(NA, 8),
    "02/05/2017",
    "02-05-2017"
  ),
  raw_partial = c(
    "UN-UNK-UNKN",
    "UN/UNK/UNKN",
    "UN UNK UNKN",
    "UN-UNK-2017",
    "UN-Feb-2017",
    "05-FEB-2017",
    "05-UNK-2017",
    "05-Feb-UNKN",
    rep(NA, 2)
  )
)
knitr::kable(raw_dates)
raw_full raw_partial
NA UN-UNK-UNKN
NA UN/UNK/UNKN
NA UN UNK UNKN
NA UN-UNK-2017
NA UN-Feb-2017
NA 05-FEB-2017
NA 05-UNK-2017
NA 05-Feb-UNKN
02/05/2017 NA
02-05-2017 NA

First, we will re-arrange the partial dates into the same format as the full dates using reshape_pdates(). That will let us combine the full and partial dates into one column with a MM/DD/YYYY format. Then, using reshape_adates(), we will convert all dates to the YYYY-MM-DD format.

working_dates <- raw_dates %>%
  mutate(
    partial = reshape_pdates(raw_partial),
    all = coalesce(raw_full, partial),
    all = reshape_adates(all)
  )
knitr::kable(working_dates)
raw_full raw_partial partial all
NA UN-UNK-UNKN UN/UN/UNKN UNKN-UN-UN
NA UN/UNK/UNKN UN/UN/UNKN UNKN-UN-UN
NA UN UNK UNKN UN/UN/UNKN UNKN-UN-UN
NA UN-UNK-2017 UN/UN/2017 2017-UN-UN
NA UN-Feb-2017 02/UN/2017 2017-02-UN
NA 05-FEB-2017 02/05/2017 2017-02-05
NA 05-UNK-2017 UN/05/2017 2017-UN-05
NA 05-Feb-UNKN 02/05/UNKN UNKN-02-05
02/05/2017 NA NA 2017-02-05
02-05-2017 NA NA 2017-02-05

For situations where missing date elements should be removed, use the trim_dates() function.

trimmed_dates <- mutate(working_dates, trimmed = trim_dates(all))
knitr::kable(trimmed_dates)
raw_full raw_partial partial all trimmed
NA UN-UNK-UNKN UN/UN/UNKN UNKN-UN-UN NA
NA UN/UNK/UNKN UN/UN/UNKN UNKN-UN-UN NA
NA UN UNK UNKN UN/UN/UNKN UNKN-UN-UN NA
NA UN-UNK-2017 UN/UN/2017 2017-UN-UN 2017
NA UN-Feb-2017 02/UN/2017 2017-02-UN 2017-02
NA 05-FEB-2017 02/05/2017 2017-02-05 2017-02-05
NA 05-UNK-2017 UN/05/2017 2017-UN-05 2017
NA 05-Feb-UNKN 02/05/UNKN UNKN-02-05 NA
02/05/2017 NA NA 2017-02-05 2017-02-05
02-05-2017 NA NA 2017-02-05 2017-02-05

If imputed dates are needed, use the impute_pdates() function. Both start and end dates can be imputed using standard imputation rules.

imputed_dates <- working_dates %>%
  mutate(
    start = impute_pdates(all, ptype = "start"),
    end = impute_pdates(all, ptype = "end")
  )
knitr::kable(imputed_dates)
raw_full raw_partial partial all start end
NA UN-UNK-UNKN UN/UN/UNKN UNKN-UN-UN NA NA
NA UN/UNK/UNKN UN/UN/UNKN UNKN-UN-UN NA NA
NA UN UNK UNKN UN/UN/UNKN UNKN-UN-UN NA NA
NA UN-UNK-2017 UN/UN/2017 2017-UN-UN 2017-01-01 2017-12-31
NA UN-Feb-2017 02/UN/2017 2017-02-UN 2017-02-01 2017-02-28
NA 05-FEB-2017 02/05/2017 2017-02-05 2017-02-05 2017-02-05
NA 05-UNK-2017 UN/05/2017 2017-UN-05 2017-01-05 2017-12-05
NA 05-Feb-UNKN 02/05/UNKN UNKN-02-05 NA NA
02/05/2017 NA NA 2017-02-05 2017-02-05 2017-02-05
02-05-2017 NA NA 2017-02-05 2017-02-05 2017-02-05