The x %s{}% p and x %s!{}% p Operators:
The x %s{}% p operator
checks for every string in character vector x if
the pattern defined in p is present.
When supplying a list on the right hand side (see s_pattern),
one can optionally include the list element at = "start" or at = "end":
Supplying
at = "start"will check if the pattern appears at the start of a string (like stri_startswith).Supplying
at = "end"will check if the pattern appears at the end of a string (like stri_endswith).
The x %s!{}% p operator is the same as x %s{}% p,
except it checks for absence of the pattern,
rather than presence.
For string (in)equality operators,
see %s==% from the 'stringi' package. strfind()<-: strfind()<-
locates, extracts, or replaces found patterns.
It complements the other string-related operators,
and uses the same s_pattern API.
It functions as follows:
strfind()finds all pattern matches, and returns the extractions of the findings in a list, just like stri_extract_all.strfind(..., i = "all" ), finds all pattern matches like stri_locate_all.strfind(..., i = i), whereiis an integer vector, locates the \(i^{th}\) occurrence of a pattern, and reports the locations in a matrix, just like stri_locate_ith.strfind() <- valuefinds pattern matches in variablex, replaces the pattern matches with the character vector specified invalue, and assigns the transformed character vector back tox.
This is somewhat similar to stri_replace, though the replacement is done in-place.
Arguments
- x
a string or character vector.
Forstrfind()<-,xmust obviously be the variable containing the character vector/string, sincestrfind()<-performs assignment in-place.- p
either a list with 'stringi' arguments (see s_pattern), or else a character vector with regular expressions.
See also the Details section.- ...
additional arguments to be specified.
- i
either one of the following can be given for
i:if
iis not given orNULL,strfind()extracts all found pattern occurrences.if
iis the string "all",strfind()locates all found pattern occurrences.if
iis an integer,strfind()locates the \(i^{th}\) pattern occurrences.
See theiargument in stri_locate_ith for details.
For
strfind() <- value,imust not be specified.- rt
use
rtto specify the Replacement Type thatstrfind()<-should perform.
Either one of the following can be given forrt:if
rtis not given,NULLor"vec",strfind()<-performs regular, vectorized replacement of all occurrences.if
rt = "dict",strfind()<-performs dictionary replacement of all occurrences.if
rt = "first",strfind()<-replaces only the first occurrences.if
rt = "last",strfind()<-replaces only the last occurrences.
Note:
rt = "first"andrt = "last"only exist for convenience; for more specific locational replacement, use stri_locate_ith orstrfind(..., i)with numerici(see the Examples section).
Forstrfind(),rtmust not be specified.- value
a character vector giving the replacement values.
Value
For the x %s{}% p and x %s!{}% p operators:
Return logical vectors.
For strfind():
Returns a list with extractions of all found patterns.
For strfind(..., i = "all"):
Returns a list with all found pattern locations.
For strfind(..., i = i) with integer vector i:
Returns an integer matrix with two columns,
giving the start and end positions of the \(i^{th}\) matches,
two NAs if no matches are found, and also two NAs if str is NA.
For strfind() <- value:
Returns nothing,
but performs in-place replacement
(using R's default in-place semantics)
of the found patterns in variable x.
Details
Right-hand Side List for the %s{}% and %s!{}% Operators
When supplying a list to the right-hand side of the
%s{}% and %s!{}% operators,
one can add the argument at.
If at = "start",
the operators will check if the pattern is present/absent at the start of the string.
If at = "end",
the operators will check if the pattern is present/absent at the end of the string.
Unlike stri_startswith or stri_endswith,
regex is supported by the %s{}% and %s!{}% operators.
See examples below.
Vectorized Replacement vs Dictionary Replacement
Vectorized replacement:
x,p, andvalueare of the same length (or recycled to become the same length).
All occurrences of patternp[j]inx[j]is replaced withvalue[j], for everyj.Dictionary replacement:
pandvalueare of the same length, and their length is independent of the length ofx.
For every single string inx, all occurrences of patternp[1]are replaced withvalue[1],
all occurrences of patternp[2]are replaced withvalue[2], etc.
Notice that for single replacement, i.e. rt = "first" or rt = "last",
it makes no sense to distinguish between vectorized or dictionary replacement,
since then only a single occurrence is being replaced per string.
See examples below.
Note
strfind()<- performs in-place replacement.
Therefore, the character vector or string to perform replacement on,
must already exist as a variable.
So take for example the following code:
strfind("hello", p = "e") <- "a" # this obviously does not work
y <- "hello"
strfind(y, p = "e") <- "a" # this works fine
In the above code, the first strfind()<- call does not work,
because the string needs to exist as a variable.
Examples
# example of %s{}% and %s!{}% ====
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
x %s{}% "a"
#> [1] TRUE FALSE
x %s!{}% "a"
#> [1] FALSE TRUE
which(x %s{}% "a")
#> [1] 1
which(x %s!{}% "a")
#> [1] 2
x[x %s{}% "a"]
#> [1] "abcdefghijklm"
x[x %s!{}% "a"]
#> [1] "nopqrstuvwxyz"
x[x %s{}% "a"] <- 1
x[x %s!{}% "a"] <- 1
print(x)
#> [1] "1" "1"
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
x %s{}% "1"
#> [1] FALSE FALSE
x %s!{}% "1"
#> [1] TRUE TRUE
which(x %s{}% "1")
#> integer(0)
which(x %s!{}% "1")
#> [1] 1 2
x[x %s{}% "1"]
#> character(0)
x[x %s!{}% "1"]
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
x[x %s{}% "1"] <- "a"
x[x %s!{}% "1"] <- "a"
print(x)
#> [1] "a" "a"
#############################################################################
# Example of %s{}% and %s!{}% with "at" argument ====
x <- c(paste0(letters, collapse = ""),
paste0(rev(letters), collapse = ""), NA)
p <- s_fixed("abc", at = "start")
x %s{}% p
#> [1] TRUE FALSE NA
stringi::stri_startswith(x, fixed = "abc") # same as above
#> [1] TRUE FALSE NA
p <- s_fixed("xyz", at = "end")
x %s{}% p
#> [1] TRUE FALSE NA
stringi::stri_endswith(x, fixed = "xyz") # same as above
#> [1] TRUE FALSE NA
p <- s_fixed("cba", at = "end")
x %s{}% p
#> [1] FALSE TRUE NA
stringi::stri_endswith(x, fixed = "cba") # same as above
#> [1] FALSE TRUE NA
p <- s_fixed("zyx", at = "start")
x %s{}% p
#> [1] FALSE TRUE NA
stringi::stri_startswith(x, fixed = "zyx") # same as above
#> [1] FALSE TRUE NA
#############################################################################
# Example of transforming ith occurrence ====
# new character vector:
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
# report ith (second and second-last) vowel locations:
p <- s_regex( # vowels
rep("A|E|I|O|U", 2),
case_insensitive = TRUE
)
loc <- strfind(x, p, i = c(2, -2))
print(loc)
#> start end
#> [1,] 5 5
#> [2,] 2 2
# extract ith vowels:
extr <- stringi::stri_sub(x, from = loc)
print(extr)
#> [1] "e" "o"
# replace ith vowels with numbers:
repl <- chartr("aeiou", "12345", extr) # transformation
stringi::stri_sub(x, loc) <- repl
print(x)
#> [1] "abcd2fghijklm" "n4pqrstuvwxyz"
#############################################################################
# Example of strfind for regular vectorized replacement ====
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW', 'BLACK', 'BEAR')
x %s{}% p
#> [1] TRUE TRUE TRUE
strfind(x, p)
#> [[1]]
#> [1] "quick"
#>
#> [[2]]
#> [1] "brown"
#>
#> [[3]]
#> [1] "fox"
#>
strfind(x, p) <- rp
print(x)
#> [1] "The SLOW brown fox jumped over the lazy dog."
#> [2] "The quick BLACK fox jumped over the lazy dog."
#> [3] "The quick brown BEAR jumped over the lazy dog."
#############################################################################
# Example of strfind for dictionary replacement ====
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW', 'BLACK', 'BEAR')
# thus dictionary is:
# quick => SLOW; brown => BLACK; fox => BEAR
strfind(x, p, rt = "dict") <- rp
print(x)
#> [1] "The SLOW BLACK BEAR jumped over the lazy dog."
#> [2] "The SLOW BLACK BEAR jumped over the lazy dog."
#> [3] "The SLOW BLACK BEAR jumped over the lazy dog."
#############################################################################
# Example of strfind for first and last replacement ====
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- s_fixed("the", case_insensitive = TRUE)
rp <- "One"
strfind(x, p, rt = "first") <- rp
print(x)
#> [1] "One quick brown fox jumped over the lazy dog."
#> [2] "One quick brown fox jumped over the lazy dog."
#> [3] "One quick brown fox jumped over the lazy dog."
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- s_fixed("the", case_insensitive = TRUE)
rp <- "Some Other"
strfind(x, p, rt = "last") <- rp
print(x)
#> [1] "The quick brown fox jumped over Some Other lazy dog."
#> [2] "The quick brown fox jumped over Some Other lazy dog."
#> [3] "The quick brown fox jumped over Some Other lazy dog."