Trim extreme values from an atomic vector, and replace with a specific value (typically NA_*).

trim_numeric(x, bounds=c(-Inf, Inf), replacement=NA_real_)
trim_integer(x, bounds=c(-2147483647L, 2147483647L), replacement=NA_integer_)
trim_date(
  x,
  bounds      = as.Date(c("1940-01-01", "2029-12-31")),
  replacement = as.Date(NA_character_)
)
trim_datetime(
  x,
  bounds      = as.POSIXct(c("1940-01-01 00:00", "2029-12-31 23:59")),
  replacement = as.POSIXct(NA_character_)
)
trim_character(
  x,
  pattern = "^.*$",
  replacement = NA_character_
)

Arguments

x

The input vector to be trimmed. Required

bounds

A two-element vector that establishes the lower and upper inclusive bounds of x.

replacement

A scalar that will replace all instances of x that fall outside of bounds or pattern.

pattern

A perl-style regular expression passed to base::grepl(). Vector elements that match the pattern are returned. Vector elements that do not match the pattern are replaced with NA_character_.

Value

An atomic vector with the same number of elements as x.

Note

The data type of x, bounds, and replacement must match the atomic data type of the function. In other words, trim_numeric() accepts only parameters of type 'numeric' (otherwise known as 'double-precision floating point'). Likewise, trim_date() accepts only parameters of type Date.

The lower bound must be less than or equal the upper bound.

The default bounds for numerics and integers are at the extremes of the data type. The default bounds for dates are arbitrary, because the origin is slippery.

Author

Will Beasley

Examples

library(OuhscMunge)

trim_numeric(runif(10, -1, 10), bounds=c(4, 8))
#>  [1] 4.073051       NA       NA 4.847309 4.430007 7.572395       NA 6.847370
#>  [9]       NA       NA

trim_integer(c(NA, 1:10), bounds=c(4L, 8L))
#>  [1] NA NA NA NA  4  5  6  7  8 NA NA

trim_date(
  x      = as.Date(c("1902-02-02", "1999-09-09", "2020-02-22", "1930-01-01", "1930-01-02")),
  bounds = as.Date(c("1990-01-01", "2030-01-01"))
)
#> [1] NA           "1999-09-09" "2020-02-22" NA           NA          

trim_datetime(
  x      = as.POSIXct(c("1902-02-02", "1999-09-09", "2020-02-22", "1930-01-01", "1930-01-02")),
  bounds = as.POSIXct(c("1990-01-01", "2030-01-01"))
)
#> [1] NA               "1999-09-09 UTC" "2020-02-22 UTC" NA              
#> [5] NA              

zip_codes <- c("12345", "a2345", "54321-6789", "54321-67890")
trim_character(zip_codes, "^\\d{5}(-\\d{4})?$")
#> [1] "12345"      NA           "54321-6789" NA          
trim_character(zip_codes)                                      # Everything passes.
#> [1] "12345"       "a2345"       "54321-6789"  "54321-67890"