R/create-batch-glossary.R
create_batch_glossary.Rd
The function returns a base::data.frame()
that other
functions use to separate long-running read and write REDCap calls into
multiple, smaller REDCap calls. The goal is to (1) reduce the chance of
time-outs, and (2) introduce little breaks between batches so that the
server isn't continually tied up.
create_batch_glossary(row_count, batch_size)
The number records in the large dataset, before it's split.
The maximum number of subject records a single batch should contain.
Currently, a base::data.frame()
is returned with the following
columns,
id
: an integer
that uniquely identifies the batch, starting at 1
.
start_index
: the index of the first row in the batch. integer
.
stop_index
: the index of the last row in the batch. integer
.
id_pretty
: a character
representation of id
, but padded with zeros.
start_index
: a character
representation of start_index
, but padded
with zeros.
stop_index
: a character
representation of stop_index
, but padded
with zeros.
label
: a character
concatenation of id_pretty
, start_index
, and
stop_index_pretty
.
This function can also assist splitting and saving a large data frame to disk as smaller files (such as a .csv). The padded columns allow the OS to sort the batches/files in sequential order.
See redcap_read()
for a function that uses create_batch_glossary
.
REDCapR::create_batch_glossary(100, 50)
#> # A tibble: 2 × 7
#> id start_index stop_index index_pretty start_index_pretty stop_index_pretty
#> <int> <int> <int> <chr> <chr> <chr>
#> 1 1 1 50 1 001 050
#> 2 2 51 100 2 051 100
#> # ℹ 1 more variable: label <chr>
REDCapR::create_batch_glossary(100, 25)
#> # A tibble: 4 × 7
#> id start_index stop_index index_pretty start_index_pretty stop_index_pretty
#> <int> <int> <int> <chr> <chr> <chr>
#> 1 1 1 25 1 001 025
#> 2 2 26 50 2 026 050
#> 3 3 51 75 3 051 075
#> 4 4 76 100 4 076 100
#> # ℹ 1 more variable: label <chr>
REDCapR::create_batch_glossary(100, 3)
#> # A tibble: 34 × 7
#> id start_index stop_index index_pretty start_index_pretty
#> <int> <int> <int> <chr> <chr>
#> 1 1 1 3 01 001
#> 2 2 4 6 02 004
#> 3 3 7 9 03 007
#> 4 4 10 12 04 010
#> 5 5 13 15 05 013
#> 6 6 16 18 06 016
#> 7 7 19 21 07 019
#> 8 8 22 24 08 022
#> 9 9 25 27 09 025
#> 10 10 28 30 10 028
#> # ℹ 24 more rows
#> # ℹ 2 more variables: stop_index_pretty <chr>, label <chr>
REDCapR::create_batch_glossary( 0, 3)
#> # A tibble: 0 × 7
#> # ℹ 7 variables: id <int>, start_index <int>, stop_index <int>,
#> # index_pretty <chr>, start_index_pretty <chr>, stop_index_pretty <chr>,
#> # label <chr>
d <- data.frame(
record_id = 1:100,
iv = sample(x=4, size=100, replace=TRUE),
dv = rnorm(n=100)
)
REDCapR::create_batch_glossary(nrow(d), batch_size=40)
#> # A tibble: 3 × 7
#> id start_index stop_index index_pretty start_index_pretty stop_index_pretty
#> <int> <int> <int> <chr> <chr> <chr>
#> 1 1 1 40 1 001 040
#> 2 2 41 80 2 041 080
#> 3 3 81 100 3 081 100
#> # ℹ 1 more variable: label <chr>