Index Arguments in the Generic Sub-setting Methods
Source:R/aaa03_squarebrackets_indx_args.R
aaa03_squarebrackets_indx_args.Rd
There are several types of arguments that can be used in the generic methods of 'squarebrackets' to specify the indices to perform operations on:
i
: to specify flat (i.e. dimensionless) indices.s, d
: to specify indices of arbitrary dimensions in any dimensional object supported by 'squarebrackets' (i.e. arrays and data.frame-like objects).margin, slice
: to specify indices of one particular dimension (for arrays and data.frame-like objects).
Only used in the idx method.obs, vars
: to specify observations and/or variables in specifically in data.frame-like objects.
For the fundamentals of indexing in 'squarebrackets',
see squarebrackets_indx_fundamentals.
In this help page x
refers to the object on which subset operations are performed.
Argument i
Any of the following can be specified for argument i
:
NULL
, corresponds to missing argument.a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers giving indices.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector, of the same length as
x
, giving the indices to select for the operation.a character vector of index names.
If an object has multiple indices with the given name, ALL the corresponding indices will be selected for the operation.a function that takes as input
x
, and returns a logical vector, giving the element indices to select for the operation.
For atomic objects,i
is interpreted asi(x)
.
For recursive objects,i
is interpreted aslapply(x, i)
.
Using the i
arguments corresponds to doing something like the following:
If i
is a function, it corresponds to the following:
Argument Pair s, d
The s, d
argument pair, inspired by the
abind::
asub function from the 'abind' package,
is the primary indexing argument for sub-set operations on dimensional objects.
The s
argument specifies the
subscripts
(i.e. dimensional indices).
The d
argument gives the dimensions for which the
s
holds
(i.e. d
specifies the "non-missing" margins).
The d
argument must be an integer vector. s
must be a list of length 1, or a list of the same length as d
.
If s
is a list of length 1,
it is internally recycled to become the same length as d
.
Each element of s
can be any of the following:
a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers with indices of the specified dimension to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the corresponding dimension size, giving the indices of the specified dimension to select for the operation.
a character vector giving the
dimnames
to select.
If a dimension has multiple indices with the given name, ALL the corresponding indices will be selected for the operation.
Note the following:
As stated,
d
specifies which index margins are non-missing.
Ifd
is of length0
, it is taken as "all index margins are missing".The default value for
d
is1:
ndim(x)
.
To keep the syntax short,
the user can use the n function instead of list()
to specify s
.
EXAMPLES
Here are some examples for clarity,
using an atomic array x
of 3 dimensions:
sb_x(x, n(1:10, 1:5), c(1, 3))
extracts the first 10 rows, all columns, and the first 5 layers, of arrayx
.sb_x(x, n(1:10), 2)
extracts the first 10 columns of arrayx
.sb_x(x, n(1:10))
,
extracts the first 10 rows, columns, and layers of arrayx
.sb_x(x, n(1:10), c(1, 3))
,
extracts the first 10 rows, all columns, and the first 10 layers, of arrayx
.
I.e.:
sb_x(x, n(1:10, 1:5), c(1, 3)) # ==> x[1:10, , 1:5, drop = FALSE]
sb_x(x, n(1:10), 2) # ==> x[ , 1:10, , drop = FALSE]
sb_x(x, n(1:10)) # ==> x[1:10, 1:10, 1:10, drop = FALSE]
sb_x(x, n(1:10), c(1, 3)) # ==> x[1:10, , 1:10, drop = FALSE]
NOTE
If length(d)
is 1,
s
can also be given as an atomic vector (of any length),
instead of a list of length 1.
Although it is allowed for s
and d
to both be atomic vectors of length 1,
for the readability of your code it is highly recommended
that s
and d
be explicitly named in your method call,
in such a case.
I.e.:
For a brief explanation of the relationship between flat indices (i
)
and subscripts (s
, d
) in arrays,
see sub2ind.
Argument Pair margin, slice
Relevant only for the idx method.
The margin
argument specifies the dimension on which argument slice
is used.
I.e. when margin = 1
, slice
selects rows;
when margin = 2
, slice
selects columns;
etc.
The slice
argument can be any of the following:
a numeric vector of strictly positive whole numbers with dimension indices to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the corresponding dimension size, giving the dimension indices to select for the operation.
a character vector of index names.
If a dimension has multiple indices with the given name, ALL the corresponding indices will be selected for the operation.
One could also give a vector of length 0
for slice
;
Argument slice
is only used in the idx method ,
and the result of idx are meant to be used inside the regular [
and [<-
operators.
Thus the effect of a zero-length index specification depends on the rule-set of
[.class(x)
and [<-.class(x)
.
Arguments obs, vars
The obs
argument specifies indices for observations (i.e. rows)
in data.frame-like objects.
The vars
argument specifies indices for variables (i.e. columns)
in data.frame-like objects.
The obs
and vars
arguments are inspired by the subset
and select
arguments, respectively, of base R's subset.data.frame
method.
However, the obs
and vars
arguments do not use
non-standard evaluation,
as to keep 'squarebrackets' fully programmatically friendly.
The obs
Argument
The obs
argument can be any of the following:
NULL
(default), corresponds to a missing argument.a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers with row indices to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the number of rows, giving the row indices to select for the operation.
a one-sided formula, with a single logical expression using the column names of the data.frame, giving the condition which observation/row indices should be selected for the operation.
So to perform an operation on the observations for which holds that height > 2
and sex != "female"
,
specify the following formula:
obs = ~ (height > 2) & (sex != "female")
If the formula is linked to an environment,
any variables not found in the data set will be searched from the environment.
The vars
Argument
The vars
argument can be any of the following
NULL
(default), corresponds to a missing argument.a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers with column indices to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the number of columns, giving the column indices to select for the operation.
a character vector giving the
colnamess
to select.
Note that 'squarebrackets' assumes data.frame-like objects have unique column names.a function that returns a logical vector, giving the column indices to select for the operation.
For example, to select all numeric variables, specifyvars = is.numeric
.a two-sided formula, where each side consists of a single term, giving a range of names to select.
For example, to select all variables between and including the variables "height" and "weight", specify the following:vars = heigth ~ weight
.
EXAMPLE
So using the obs, vars
arguments corresponds to doing something like the following:
sb2_x(x, obs = obs, vars = vars) # ==> subset(x, ...obs..., ...vars...)
Argument inv
Relevant for the sb_mod/sb2_mod, sb_set/sb2_set,
and idx methods.
By default, inv = FALSE
, which translates the indices like normally.
When inv = TRUE
, the inverse of the indices is taken.
Consider, for example, an atomic matrix x
;
using sb_mod(x, 1:2, 2L, tf = tf)
corresponds to something like the following:
x[, 1:2] <- tf(x[, 1:2])
x
and using sb_mod(x, vars = 1:2, inv = TRUE, tf = tf)
corresponds to something like the following:
x[, -1:-2] <- tf(x[, -1:-2])
x
NOTE
The order in which the user gives indices when inv = TRUE
generally does not matter.
The order of the indices as they appear in the original object x
is maintained,
just like in base 'R'.
Therefore, when replacing multiple values where the order of the replacement matters,
it is better to keep inv = FALSE
, which is the default.
For replacement with a single value or with a transformation function,
inv = TRUE
can be used without considering the ordering.
All Missing Indices
NULL
in the indexing arguments corresponds to a missing argument.
For s, d
, specifying d
of length 0 also corresponds to all subscripts being missing.
Thus, for both sb_x/sb2_x and sb_wo/sb2_wo,
using missing or NULL
indexing arguments for all indexing arguments corresponds to something like the following:
x[]
Similarly, for sb_mod/sb2_mod and sb_set/sb2_set,
using missing or NULL
indexing arguments corresponds to something like the following:
x[] <- rp # for replacement
x[] <- tf(x) # for transformation
The above is true even if inv = TRUE
and/or red = TRUE
.
Disallowed Combinations of Index Arguments
One cannot specify i
and the other indexing arguments simultaneously;
it's either i
, or the other arguments.
One cannot specify row
and filter
simultaneously;
it's either one or the other.
One cannot specify col
and vars
simultaneously;
it's either one or the other.
One cannot specify the s, d
pair and slice, margin
pair simultaneously;
it's either one pair or the other pair.
In the above cases it holds that if one set is specified, the other is set is ignored.
Drop
Sub-setting with the generic methods from the 'squarebrackets' R-package using dimensional arguments
(s, d, row, col filter, vars
)
always use drop = FALSE
.
To drop potentially redundant (i.e. single level) dimensions,
use the drop function, like so:
References
Plate T, Heiberger R (2016). abind: Combine Multidimensional Arrays. R package version 1.4-5, https://CRAN.R-project.org/package=abind.