Overview of the 'tinycodet' Extension of 'stringi'
Source:R/aaa3_tinycodet_strings.R
aaa3_tinycodet_strings.Rd
Virtually every programming language,
even those primarily focused on mathematics,
will at some point have to deal with strings.
R's atomic classes boil down to some form of either numbers or characters.
R's numerical functions are generally very fast.
But R's native string functions are somewhat slow,
do not have a unified naming scheme,
and are not as comprehensive as R's impressive numerical functions.
The primary R-package that fixes this is 'stringi'.
'stringi' is arguably the fastest and most comprehensive
string manipulation package available at the time of writing.
Many string related packages fully depend on 'stringi'
(see its reverse-dependencies on CRAN).
As string manipulation is so important to programming languages,
'tinycodet' adds a little bit new functionality to 'stringi'.
'tinycodet' adds the following functions to extend 'stringi':
Find \(i^{th}\) pattern occurrence (stri_locate_ith), or \(i^{th}\) text boundary (stri_locate_ith_boundaries).
'tinycodet' adds the following operators, to complement the already existing 'stringi' operators:
Infix operators for string arithmetic.
Infix operators for string sub-setting, which get or remove the first and/or last
n
characters from strings.Infix operators for detecting patterns, and strfind()<- for locating/extracting/replacing found patterns.
And finally, 'tinycodet' adds the somewhat separate
strcut_-functions,
to cut strings into pieces without removing the delimiters.
Regarding Vector Recycling in the 'stringi'-based Functions
Generally speaking, vector recycling is supported as 'stringi' itself supports it also.
There are, however, a few exceptions.
First, matrix inputs
(like in strcut_loc
and string sub-setting operators)
will generally not be recycled.
Second, the i
argument in stri_locate_ith does not support vector recycling.
Scalar recycling is virtually always supported.
References
Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1–59, doi:10.18637/jss.v103.i02
Examples
# character vector:
x <- c("3rd 1st 2nd", "5th 4th 6th")
print(x)
#> [1] "3rd 1st 2nd" "5th 4th 6th"
# detect if there are digits:
x %s{}% "\\d"
#> [1] TRUE TRUE
# find second last digit:
loc <- stri_locate_ith(x, i = -2, regex = "\\d")
stringi::stri_sub(x, from = loc)
#> [1] "1" "4"
# cut x into matrix of individual words:
mat <- strcut_brk(x, "word")
# sort rows of matrix using the fast %row~% operator:
rank <- stringi::stri_rank(as.vector(mat)) |> matrix(ncol = ncol(mat))
sorted <- mat %row~% rank
sorted[is.na(sorted)] <- ""
# join elements of every row into a single character vector:
stri_c_mat(sorted, margin = 1, sep = " ")
#> [1] " 1st 2nd 3rd" " 4th 5th 6th"