Compare two data frames — get_comparison • compareDFx

Generates a detailed comparison report of two data frames

Usage

get_comparison(df1, df2, id_cols, tolerance = 1e-05)

Arguments

df1, df2: data frames to compare
id_cols: the column names in df1 and df2 that make up a unique ID
tolerance: the amount by which two numbers can differ to be considered equal, default is 0.00001

Value

a named list with elements:

col_summary_simple: summary statistics for columns
col_summary_by_col: summary statistics by column
row_summary: summary statistics for rows
all_pivoted: comparison data displayed where each value in df1 is shown directly to the left of its df2 counterpart (a pivoted view) with comparison annotation columns
all_tb: comparison data displayed where each row in df1 is shown directly above its df2 counterpart (a top-bottom view) with comparison annotation columns
all_lr: comparison data displayed where each column in df1 is shown to the left of its df2 counterpart (a left-right view) with comparison annotation columns
all_tb_change_indices: a named list where the names are the data columns in all_tb and the elements are numeric vectors of the row indices that changed between df1 and df2 in a column
all_lr_change_indices: a named list where the names are the data columns in all_lr and the elements are numeric vectors of the row indices that changed between df1 and df2 in a column
id_cols: the columns in df1 and df2 that form a unique row ID
cc_out: a name list with four elements:
- same: a logical indicating whether the column names in df1 and df2 are the same
- both: a character vector of column names that are in both df1 and df2
- df1_only: character vector of column names that are in df1 but not df2
- df2_only: character vector of column names that are in df2 but not df1
df1: the raw data from df1
df2: the raw data from df2

Examples

id_cols <- c("id1", "id2")
comparison <- get_comparison(compareDFx::df1, compareDFx::df2, id_cols)
#> Warning: ID duplicates detected, recommend fixing these and re-running `get_comparison()`
#> Warning: ID columns contain `NA`, recommend fixing these and re-running `get_comparison()`
#> df1 and df2 have different columns therefore no records are recorded as 'matched'