Trim extreme values from an atomic vector, and replace with a specific value (typically NA_*
).
trim_numeric(x, bounds=c(-Inf, Inf), replacement=NA_real_)
trim_integer(x, bounds=c(-2147483647L, 2147483647L), replacement=NA_integer_)
trim_date(
x,
bounds = as.Date(c("1940-01-01", "2029-12-31")),
replacement = as.Date(NA_character_)
)
trim_datetime(
x,
bounds = as.POSIXct(c("1940-01-01 00:00", "2029-12-31 23:59")),
replacement = as.POSIXct(NA_character_)
)
trim_character(
x,
pattern = "^.*$",
replacement = NA_character_
)
The input vector to be trimmed. Required
A two-element vector that establishes the lower and upper inclusive bounds of x
.
A scalar that will replace all instances of x
that fall outside of bounds
or pattern
.
A perl-style regular expression passed to base::grepl()
.
Vector elements that match the pattern are returned.
Vector elements that do not match the pattern are replaced with NA_character_.
An atomic vector with the same number of elements as x
.
The data type of x
, bounds
, and replacement
must match the atomic data type of the function.
In other words, trim_numeric()
accepts only parameters of type 'numeric' (otherwise known as
'double-precision floating point'). Likewise, trim_date()
accepts only parameters of type Date
.
The lower bound must be less than or equal the upper bound.
The default bounds for numerics and integers are at the extremes of the data type. The default bounds for dates are arbitrary, because the origin is slippery.
library(OuhscMunge)
trim_numeric(runif(10, -1, 10), bounds=c(4, 8))
#> [1] 4.073051 NA NA 4.847309 4.430007 7.572395 NA 6.847370
#> [9] NA NA
trim_integer(c(NA, 1:10), bounds=c(4L, 8L))
#> [1] NA NA NA NA 4 5 6 7 8 NA NA
trim_date(
x = as.Date(c("1902-02-02", "1999-09-09", "2020-02-22", "1930-01-01", "1930-01-02")),
bounds = as.Date(c("1990-01-01", "2030-01-01"))
)
#> [1] NA "1999-09-09" "2020-02-22" NA NA
trim_datetime(
x = as.POSIXct(c("1902-02-02", "1999-09-09", "2020-02-22", "1930-01-01", "1930-01-02")),
bounds = as.POSIXct(c("1990-01-01", "2030-01-01"))
)
#> [1] NA "1999-09-09 UTC" "2020-02-22 UTC" NA
#> [5] NA
zip_codes <- c("12345", "a2345", "54321-6789", "54321-67890")
trim_character(zip_codes, "^\\d{5}(-\\d{4})?$")
#> [1] "12345" NA "54321-6789" NA
trim_character(zip_codes) # Everything passes.
#> [1] "12345" "a2345" "54321-6789" "54321-67890"