Skip to contents

Generates a detailed comparison report of two data frames

Usage

get_comparison(df1, df2, id_cols, tolerance = 1e-05)

Arguments

df1, df2

data frames to compare

id_cols

the column names in df1 and df2 that make up a unique ID

tolerance

the amount by which two numbers can differ to be considered equal, default is 0.00001

Value

a named list with elements:

  • col_summary_simple: summary statistics for columns

  • col_summary_by_col: summary statistics by column

  • row_summary: summary statistics for rows

  • all_pivoted: comparison data displayed where each value in df1 is shown directly to the left of its df2 counterpart (a pivoted view) with comparison annotation columns

  • all_tb: comparison data displayed where each row in df1 is shown directly above its df2 counterpart (a top-bottom view) with comparison annotation columns

  • all_lr: comparison data displayed where each column in df1 is shown to the left of its df2 counterpart (a left-right view) with comparison annotation columns

  • all_tb_change_indices: a named list where the names are the data columns in all_tb and the elements are numeric vectors of the row indices that changed between df1 and df2 in a column

  • all_lr_change_indices: a named list where the names are the data columns in all_lr and the elements are numeric vectors of the row indices that changed between df1 and df2 in a column

  • id_cols: the columns in df1 and df2 that form a unique row ID

  • cc_out: a name list with four elements:

    • same: a logical indicating whether the column names in df1 and df2 are the same

    • both: a character vector of column names that are in both df1 and df2

    • df1_only: character vector of column names that are in df1 but not df2

    • df2_only: character vector of column names that are in df2 but not df1

  • df1: the raw data from df1

  • df2: the raw data from df2

Examples

id_cols <- c("id1", "id2")
comparison <- get_comparison(compareDFx::df1, compareDFx::df2, id_cols)
#> Warning: ID duplicates detected, recommend fixing these and re-running `get_comparison()`
#> Warning: ID columns contain `NA`, recommend fixing these and re-running `get_comparison()`
#> df1 and df2 have different columns therefore no records are recorded as 'matched'