library(tinycodet)
#> Run `?tinycodet::tinycodet` to open the introduction help page of 'tinycodet'.
Overview
‘tinycodet’ adds 3 sets of string-related operators.
First, sub-setting operators:
- 
x %s><% ss: returns the firstn1and lastn2characters from each string in character vectorx. - 
x %s<>% sstrims away firstn1and lastn2characters from each string in character vectorx. 
Second, ‘stringi’ already has the %s+%,
%s*%, and %s$% operators, and ‘tinycodet’ adds
some additional string arithmetic operators to complete the set:
- 
x %s-% premoves patternpfrom each string in character vectorx; - 
x %s/% pcounts how often patternpoccurs in each string of character vectorx. - 
x %s//% brkcounts how often the text boundary specified in listbrkoccurs in each string of character vectorx. - 
x %ss% psplits the strings inxby a delimiter character/pattern defined inp, and removespin the process. 
And finally, string search operators:
- 
x %s{}% poperator checks for every string in character vectorxif the pattern defined inpis present. Can also be used to check if the strings specifically start or end with patternp. - 
x %s!{}% poperator checks for every string in character vectorxif the pattern defined inpis not present. Can also be used to check if the strings specifically does not start or end with patternp. - 
strfind()<-locates, extracts, or replaces found patterns. 
The x %s-% p and x %s/% p operators, and
the string detection operators (%s{}%, %s!{}%,
strfind()<-) perform pattern matching for various
purposes. When a character vector or string is given on the right hand
side, this is interpreted as case-sensitive regex patterns
from stringi.
But, of course, sometimes one wants to change this. For example, one may want it to be case insensitive. Or perhaps one wants to use fixed expressions, or something else.
Instead of giving a string or character vector of regex patterns, one
can also supply a list to the right-hand side, to specify exactly how
the pattern should be interpreted. The list should use the exact same
naming convention as stringi. For example:
list(regex=p, case_insensitive=FALSE, ...)list(fixed=p, ...)list(coll=p, ...)list(charclass=p, ...)
For convenience, ‘tinycodet’ adds the following functions for this purpose:
- 
s_regex(p, ...)is equivalent tolist(regex = p, ...) - 
s_fixed(p, ...)is equivalent tolist(fixed = p, ...) - 
s_coll(p, ...)is equivalent tolist(coll = p, ...) - 
s_chrcls(p, ...)is equivalent tolist(charclass = p, ... ) 
The next sections will give more details on the given overview.
String subsetting operators
The x %s><% ss operator returns a subset of each
string in character vector x. Here ss is a
vector of length 2, or a matrix with nrow(ss)=length(x) and
2 columns. The object ss should consist entirely of
non-negative integers (thus 0, 1, 2, etc. are valid, but -1, -2, -3 etc
are not valid). The first element/column of ss gives the number of
characters counting from the left side to be extracted from x. The
second element/column of ss gives the number of characters counting from
the right side to be extracted from x.
Here are 2 examples:
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
ss <- c(2,3)
x %s><% ss
#> [1] "abklm" "noxyz"
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
ss <- c(1,0)
x %s><% ss
#> [1] "a" "n"Thus x %s><% ss “gets” or extracts the given
number of characters from the left and the right, and removes the rest.
There is also x %s<>% ss, which is the opposite: it
trims away the number of characters from the left and right as defined
in the matrix ss, leaving you with whatever is left.
Here are again 2 examples:
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
ss <- c(2,3)
x %s<>% ss
#> [1] "cdefghij" "pqrstuvw"
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
ss <- c(1,0)
x %s<>% ss
#> [1] "bcdefghijklm" "opqrstuvwxyz"
String arithmetic
The tinycodet package includes 7 string arithmetic
operators (3 of which re-exported from ‘stringi’):
- 
x %s+% yconcatenatesxandy(exported from ‘stringi’); - 
x %s-% premoves patternpfrom each string in character vectorx; - 
x %s*% nrepeats each string in character vectorxforntimes (exported from ‘stringi’); - 
x %s/% pcounts how often patternpoccurs in each string of character vectorx. - 
x %s//% brkcounts how often the text boundary specified in listbrkoccurs in each string of character vectorx. - 
e1 %s$% e2provides access tostri_sprintf(exported from ‘stringi’); - 
x %ss% psplits the strings inxby a delimiter character/pattern defined inp, and removespin the process. 
I.e.:
"Hello "%s+% " world"
#> [1] "Hello  world"
c("Hello world", "Goodbye world") %s-% " world"
#> [1] "Hello"   "Goodbye"
c("Hello world", "Goodbye world") %s-% s_fixed(" world")
#> [1] "Hello"   "Goodbye"
c("Ha", "Ho", "Hi", "Hu", "He", "Ha") %s*% 2:7
#> [1] "HaHa"           "HoHoHo"         "HiHiHiHi"       "HuHuHuHuHu"    
#> [5] "HeHeHeHeHeHe"   "HaHaHaHaHaHaHa"
c("hello World & goodbye world", "world domination!") %s/% s_fixed("world", case_insensitive = TRUE)
#> [1] 2 1
c("hello world & goodbye world", "world domination!") %s//% list(type = "word")
#> [1] 9 4The right-side arguments y, and n can be a
single value, or a vector of the same length as x. The
right-side argument p can be string or character vector, or
a list as described in the Overview section.
Detect Patterns
Detect
The x %s{}% p operator checks for every string in
character vector x if the pattern defined in p
is present. The x %s!{}% p operator checks for every string
in character vector x if the pattern defined in
p is NOT present.
Examples:
x <- c(paste0(letters[1:13], collapse=""), paste0(letters[14:26], collapse=""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
x %s{}% "a"
#> [1]  TRUE FALSE
x %s!{}% "a"
#> [1] FALSE  TRUE
which(x %s{}% "a")
#> [1] 1
which(x %s!{}% "a")
#> [1] 2
x[x %s{}% "a"]
#> [1] "abcdefghijklm"
x[x %s!{}% "a"]
#> [1] "nopqrstuvwxyz"Detect - start or end with pattern
When supplying a list on the right hand side (see the Overview
section above), one can include the list element
at = "start" or at = "end":
- Supplying 
at = "start"will check if strings start with the patterns (seestringi::stri_startswith). - Supplying 
at = "end"will check if strings end with the patterns (seestringi::stri_endswith). 
Examples:
x <- c(paste0(letters, collapse=""), paste0(rev(letters), collapse=""), NA)
p <- s_fixed("abc", at = "start")
x %s{}% p
#> [1]  TRUE FALSE    NA
stringi::stri_startswith(x, fixed = "abc") # same as above
#> [1]  TRUE FALSE    NA
p <- s_fixed("xyz", at = "end")
x %s{}% p
#> [1]  TRUE FALSE    NA
stringi::stri_endswith(x, fixed = "xyz") # same as above
#> [1]  TRUE FALSE    NA
p <- s_fixed("cba", at = "end")
x %s{}% p
#> [1] FALSE  TRUE    NA
stringi::stri_endswith(x, fixed = "cba") # same as above
#> [1] FALSE  TRUE    NA
p <- s_fixed("zyx", at = "start")
x %s{}% p
#> [1] FALSE  TRUE    NA
stringi::stri_startswith(x, fixed = "zyx") # same as above
#> [1] FALSE  TRUE    NA
Locate, Extract, or Replace Patterns
strfind()<- locates, extracts, or replaces found
patterns. Like the other operators, the argument p can be a
string or character vector, or a list as described in the Overview
section above.
It can be used in several different ways.
Extract
strfind() finds all pattern matches, and returns the
extractions of the findings in a list, just like
stringi::stri_extract_all():
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
p <- s_fixed(c('quick', 'brown', 'fox'))
strfind(x, p)
#> [[1]]
#> [1] "quick"
#> 
#> [[2]]
#> [1] "brown"
#> 
#> [[3]]
#> [1] "fox"
Locate
strfind(..., i = "all" ), finds all pattern matches like
stringi::stri_locate_all(). And
strfind(..., i = i), where i is an integer
vector, locates the ith occurrence of a pattern, and reports the
locations in a matrix, just like stri_locate_ith():
p <- s_fixed("the", case_insensitive = TRUE)
strfind(x, p, i = "all")
#> [[1]]
#>      start end
#> [1,]     1   3
#> [2,]    33  35
#> 
#> [[2]]
#>      start end
#> [1,]     1   3
#> [2,]    33  35
#> 
#> [[3]]
#>      start end
#> [1,]     1   3
#> [2,]    33  35
strfind(x, p, i = c(1, -1, 2))
#>      start end
#> [1,]     1   3
#> [2,]    33  35
#> [3,]    33  35
Replace
strfind() <- value finds pattern matches in variable
x, replaces the pattern matches with the character vector
specified in value, and assigns the transformed character
vector back to x. This is somewhat similar to
stringi::stri_replace(), though the replacement is done
in-place. It supports vectorized, dictionary, first, and last
replacement:
# vectorized replacement:
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW',  'BLACK', 'BEAR')
strfind(x, p) <- rp
print(x)
#> [1] "The SLOW brown fox jumped over the lazy dog."  
#> [2] "The quick BLACK fox jumped over the lazy dog." 
#> [3] "The quick brown BEAR jumped over the lazy dog."
# dictionary replacement:
# quick => SLOW; brown => BLACK; fox => BEAR
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW',  'BLACK', 'BEAR')
strfind(x, p, rt = "dict") <- rp
print(x)
#> [1] "The SLOW BLACK BEAR jumped over the lazy dog."
#> [2] "The SLOW BLACK BEAR jumped over the lazy dog."
#> [3] "The SLOW BLACK BEAR jumped over the lazy dog."
# first replacement:
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
p <- s_fixed("the", case_insensitive = TRUE)
rp <- c('ONE')
strfind(x, p, rt = "first") <- rp
print(x)
#> [1] "ONE quick brown fox jumped over the lazy dog."
#> [2] "ONE quick brown fox jumped over the lazy dog."
#> [3] "ONE quick brown fox jumped over the lazy dog."
# last replacement:
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
p <- s_fixed("the", case_insensitive = TRUE)
rp <- c('ONE')
strfind(x, p, rt = "last") <- rp
print(x)
#> [1] "The quick brown fox jumped over ONE lazy dog."
#> [2] "The quick brown fox jumped over ONE lazy dog."
#> [3] "The quick brown fox jumped over ONE lazy dog."