The x %s{}% p
and x %s!{}% p
Operators:
The x %s{}% p
operator
checks for every string in character vector x
if
the pattern defined in p
is present.
When supplying a list on the right hand side (see s_pattern),
one can optionally include the list element at = "start"
or at = "end"
:
Supplying
at = "start"
will check if the pattern appears at the start of a string (like stri_startswith).Supplying
at = "end"
will check if the pattern appears at the end of a string (like stri_endswith).
The x %s!{}% p
operator is the same as x %s{}% p
,
except it checks for absence of the pattern,
rather than presence.
For string (in)equality operators,
see %s==% from the 'stringi' package. strfind()<-
: strfind()<-
locates, extracts, or replaces found patterns.
It complements the other string-related operators,
and uses the same s_pattern API.
It functions as follows:
strfind()
finds all pattern matches, and returns the extractions of the findings in a list, just like stri_extract_all.strfind(..., i = "all" )
, finds all pattern matches like stri_locate_all.strfind(..., i = i)
, wherei
is an integer vector, locates the \(i^{th}\) occurrence of a pattern, and reports the locations in a matrix, just like stri_locate_ith.strfind() <- value
finds pattern matches in variablex
, replaces the pattern matches with the character vector specified invalue
, and assigns the transformed character vector back tox
.
This is somewhat similar to stri_replace, though the replacement is done in-place.
Arguments
- x
a string or character vector.
Forstrfind()<-
,x
must obviously be the variable containing the character vector/string, sincestrfind()<-
performs assignment in-place.- p
either a list with 'stringi' arguments (see s_pattern), or else a character vector with regular expressions.
See also the Details section.
- ...
additional arguments to be specified.
- i
either one of the following can be given for
i
:if
i
is not given orNULL
,strfind()
extracts all found pattern occurrences.if
i
is the string "all",strfind()
locates all found pattern occurrences.if
i
is an integer,strfind()
locates the \(i^{th}\) pattern occurrences.
See thei
argument in stri_locate_ith for details.
For
strfind() <- value
,i
must not be specified.- rt
use
rt
to specify the Replacement Type thatstrfind()<-
should perform.
Either one of the following can be given forrt
:if
rt
is not given,NULL
or"vec"
,strfind()<-
performs regular, vectorized replacement of all occurrences.if
rt = "dict"
,strfind()<-
performs dictionary replacement of all occurrences.if
rt = "first"
,strfind()<-
replaces only the first occurrences.if
rt = "last"
,strfind()<-
replaces only the last occurrences.
Note:
rt = "first"
andrt = "last"
only exist for convenience; for more specific locational replacement, use stri_locate_ith orstrfind(..., i)
with numerici
(see the Examples section).
Forstrfind()
,rt
must not be specified.- value
a character vector giving the replacement values.
Value
For the x %s{}% p
and x %s!{}% p
operators:
Return logical vectors.
For strfind()
:
Returns a list with extractions of all found patterns.
For strfind(..., i = "all")
:
Returns a list with all found pattern locations.
For strfind(..., i = i)
with integer vector i
:
Returns an integer matrix with two columns,
giving the start and end positions of the \(i^{th}\) matches,
two NAs if no matches are found, and also two NA
s if str is NA
.
For strfind() <- value
:
Returns nothing,
but performs in-place replacement
(using R's default in-place semantics)
of the found patterns in variable x
.
Details
Right-hand Side List for the %s{}%
and %s!{}%
Operators
When supplying a list to the right-hand side of the
%s{}%
and %s!{}%
operators,
one can add the argument at
.
If at = "start"
,
the operators will check if the pattern is present/absent at the start of the string.
If at = "end"
,
the operators will check if the pattern is present/absent at the end of the string.
Unlike stri_startswith or stri_endswith,
regex
is supported by the %s{}%
and %s!{}%
operators.
See examples below.
Vectorized Replacement vs Dictionary Replacement
Vectorized replacement:
x
,p
, andvalue
are of the same length (or recycled to become the same length).
All occurrences of patternp[j]
inx[j]
is replaced withvalue[j]
, for everyj
.Dictionary replacement:
p
andvalue
are of the same length, and their length is independent of the length ofx
.
For every single string inx
, all occurrences of patternp[1]
are replaced withvalue[1]
,
all occurrences of patternp[2]
are replaced withvalue[2]
, etc.
Notice that for single replacement, i.e. rt = "first"
or rt = "last"
,
it makes no sense to distinguish between vectorized or dictionary replacement,
since then only a single occurrence is being replaced per string.
See examples below.
Note
strfind()<-
performs in-place replacement.
Therefore, the character vector or string to perform replacement on,
must already exist as a variable.
So take for example the following code:
strfind("hello", p = "e") <- "a" # this obviously does not work
y <- "hello"
strfind(y, p = "e") <- "a" # this works fine
In the above code, the first strfind()<-
call does not work,
because the string needs to exist as a variable.
Examples
# example of %s{}% and %s!{}% ====
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
x %s{}% "a"
#> [1] TRUE FALSE
x %s!{}% "a"
#> [1] FALSE TRUE
which(x %s{}% "a")
#> [1] 1
which(x %s!{}% "a")
#> [1] 2
x[x %s{}% "a"]
#> [1] "abcdefghijklm"
x[x %s!{}% "a"]
#> [1] "nopqrstuvwxyz"
x[x %s{}% "a"] <- 1
x[x %s!{}% "a"] <- 1
print(x)
#> [1] "1" "1"
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
x %s{}% "1"
#> [1] FALSE FALSE
x %s!{}% "1"
#> [1] TRUE TRUE
which(x %s{}% "1")
#> integer(0)
which(x %s!{}% "1")
#> [1] 1 2
x[x %s{}% "1"]
#> character(0)
x[x %s!{}% "1"]
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
x[x %s{}% "1"] <- "a"
x[x %s!{}% "1"] <- "a"
print(x)
#> [1] "a" "a"
#############################################################################
# Example of %s{}% and %s!{}% with "at" argument ====
x <- c(paste0(letters, collapse = ""),
paste0(rev(letters), collapse = ""), NA)
p <- s_fixed("abc", at = "start")
x %s{}% p
#> [1] TRUE FALSE NA
stringi::stri_startswith(x, fixed = "abc") # same as above
#> [1] TRUE FALSE NA
p <- s_fixed("xyz", at = "end")
x %s{}% p
#> [1] TRUE FALSE NA
stringi::stri_endswith(x, fixed = "xyz") # same as above
#> [1] TRUE FALSE NA
p <- s_fixed("cba", at = "end")
x %s{}% p
#> [1] FALSE TRUE NA
stringi::stri_endswith(x, fixed = "cba") # same as above
#> [1] FALSE TRUE NA
p <- s_fixed("zyx", at = "start")
x %s{}% p
#> [1] FALSE TRUE NA
stringi::stri_startswith(x, fixed = "zyx") # same as above
#> [1] FALSE TRUE NA
#############################################################################
# Example of transforming ith occurrence ====
# new character vector:
x <- c(paste0(letters[1:13], collapse = ""),
paste0(letters[14:26], collapse = ""))
print(x)
#> [1] "abcdefghijklm" "nopqrstuvwxyz"
# report ith (second and second-last) vowel locations:
p <- s_regex( # vowels
rep("A|E|I|O|U", 2),
case_insensitive = TRUE
)
loc <- strfind(x, p, i = c(2, -2))
print(loc)
#> start end
#> [1,] 5 5
#> [2,] 2 2
# extract ith vowels:
extr <- stringi::stri_sub(x, from = loc)
print(extr)
#> [1] "e" "o"
# replace ith vowels with numbers:
repl <- chartr("aeiou", "12345", extr) # transformation
stringi::stri_sub(x, loc) <- repl
print(x)
#> [1] "abcd2fghijklm" "n4pqrstuvwxyz"
#############################################################################
# Example of strfind for regular vectorized replacement ====
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW', 'BLACK', 'BEAR')
x %s{}% p
#> [1] TRUE TRUE TRUE
strfind(x, p)
#> [[1]]
#> [1] "quick"
#>
#> [[2]]
#> [1] "brown"
#>
#> [[3]]
#> [1] "fox"
#>
strfind(x, p) <- rp
print(x)
#> [1] "The SLOW brown fox jumped over the lazy dog."
#> [2] "The quick BLACK fox jumped over the lazy dog."
#> [3] "The quick brown BEAR jumped over the lazy dog."
#############################################################################
# Example of strfind for dictionary replacement ====
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- c('quick', 'brown', 'fox')
rp <- c('SLOW', 'BLACK', 'BEAR')
# thus dictionary is:
# quick => SLOW; brown => BLACK; fox => BEAR
strfind(x, p, rt = "dict") <- rp
print(x)
#> [1] "The SLOW BLACK BEAR jumped over the lazy dog."
#> [2] "The SLOW BLACK BEAR jumped over the lazy dog."
#> [3] "The SLOW BLACK BEAR jumped over the lazy dog."
#############################################################################
# Example of strfind for first and last replacement ====
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- s_fixed("the", case_insensitive = TRUE)
rp <- "One"
strfind(x, p, rt = "first") <- rp
print(x)
#> [1] "One quick brown fox jumped over the lazy dog."
#> [2] "One quick brown fox jumped over the lazy dog."
#> [3] "One quick brown fox jumped over the lazy dog."
x <- rep('The quick brown fox jumped over the lazy dog.', 3)
print(x)
#> [1] "The quick brown fox jumped over the lazy dog."
#> [2] "The quick brown fox jumped over the lazy dog."
#> [3] "The quick brown fox jumped over the lazy dog."
p <- s_fixed("the", case_insensitive = TRUE)
rp <- "Some Other"
strfind(x, p, rt = "last") <- rp
print(x)
#> [1] "The quick brown fox jumped over Some Other lazy dog."
#> [2] "The quick brown fox jumped over Some Other lazy dog."
#> [3] "The quick brown fox jumped over Some Other lazy dog."