The function returns a base::data.frame() that other functions use to separate long-running read and write REDCap calls into multiple, smaller REDCap calls. The goal is to (1) reduce the chance of time-outs, and (2) introduce little breaks between batches so that the server isn't continually tied up.

create_batch_glossary(row_count, batch_size)

Arguments

row_count

The number records in the large dataset, before it's split.

batch_size

The maximum number of subject records a single batch should contain.

Value

Currently, a base::data.frame() is returned with the following columns,

  • id: an integer that uniquely identifies the batch, starting at 1.

  • start_index: the index of the first row in the batch. integer.

  • stop_index: the index of the last row in the batch. integer.

  • id_pretty: a character representation of id, but padded with zeros.

  • start_index: a character representation of start_index, but padded with zeros.

  • stop_index: a character representation of stop_index, but padded with zeros.

  • label: a character concatenation of id_pretty, start_index, and stop_index_pretty.

Details

This function can also assist splitting and saving a large data frame to disk as smaller files (such as a .csv). The padded columns allow the OS to sort the batches/files in sequential order.

See also

See redcap_read() for a function that uses create_batch_glossary.

Author

Will Beasley

Examples

REDCapR::create_batch_glossary(100, 50)
#> # A tibble: 2 × 7
#>      id start_index stop_index index_pretty start_index_pretty stop_inde…¹ label
#>   <int>       <int>      <int> <chr>        <chr>              <chr>       <chr>
#> 1     1           1         50 1            001                050         1_00…
#> 2     2          51        100 2            051                100         2_05…
#> # … with abbreviated variable name ¹​stop_index_pretty
REDCapR::create_batch_glossary(100, 25)
#> # A tibble: 4 × 7
#>      id start_index stop_index index_pretty start_index_pretty stop_inde…¹ label
#>   <int>       <int>      <int> <chr>        <chr>              <chr>       <chr>
#> 1     1           1         25 1            001                025         1_00…
#> 2     2          26         50 2            026                050         2_02…
#> 3     3          51         75 3            051                075         3_05…
#> 4     4          76        100 4            076                100         4_07…
#> # … with abbreviated variable name ¹​stop_index_pretty
REDCapR::create_batch_glossary(100,  3)
#> # A tibble: 34 × 7
#>       id start_index stop_index index_pretty start_index_pretty stop_ind…¹ label
#>    <int>       <int>      <int> <chr>        <chr>              <chr>      <chr>
#>  1     1           1          3 01           001                003        01_0…
#>  2     2           4          6 02           004                006        02_0…
#>  3     3           7          9 03           007                009        03_0…
#>  4     4          10         12 04           010                012        04_0…
#>  5     5          13         15 05           013                015        05_0…
#>  6     6          16         18 06           016                018        06_0…
#>  7     7          19         21 07           019                021        07_0…
#>  8     8          22         24 08           022                024        08_0…
#>  9     9          25         27 09           025                027        09_0…
#> 10    10          28         30 10           028                030        10_0…
#> # … with 24 more rows, and abbreviated variable name ¹​stop_index_pretty
REDCapR::create_batch_glossary(  0,  3)
#> # A tibble: 0 × 7
#> # … with 7 variables: id <int>, start_index <int>, stop_index <int>,
#> #   index_pretty <chr>, start_index_pretty <chr>, stop_index_pretty <chr>,
#> #   label <chr>
d <- data.frame(
  record_id = 1:100,
  iv        = sample(x=4, size=100, replace=TRUE),
  dv        = rnorm(n=100)
)
REDCapR::create_batch_glossary(nrow(d), batch_size=40)
#> # A tibble: 3 × 7
#>      id start_index stop_index index_pretty start_index_pretty stop_inde…¹ label
#>   <int>       <int>      <int> <chr>        <chr>              <chr>       <chr>
#> 1     1           1         40 1            001                040         1_00…
#> 2     2          41         80 2            041                080         2_04…
#> 3     3          81        100 3            081                100         3_08…
#> # … with abbreviated variable name ¹​stop_index_pretty