Index Arguments in the Generic Sub-setting Methods
Source:R/aaa04_squarebrackets_indx_args.R
aaa04_squarebrackets_indx_args.Rd
There are several types of arguments that can be used in the generic methods of 'squarebrackets' to specify the indices to perform operations on:
i
: to specify flat (i.e. dimensionless) indices.s, d
: to specify indices of arbitrary dimensions in any dimensional object supported by 'squarebrackets' (i.e. arrays and data.frame-like objects).margin, slice
: to specify indices of one particular dimension (for arrays and data.frame-like objects).
Only used in the idx method.obs, vars
: to specify observations and/or variables in specifically in data.frame-like objects.
For the fundamentals of indexing in 'squarebrackets',
see squarebrackets_indx_fundamentals.
In this help page x
refers to the object on which subset operations are performed.
Argument i
Any of the following can be specified for argument i
:
NULL
, corresponds to missing argument.a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers giving indices.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector, of the same length as
x
, giving the indices to select for the operation.a character vector of index names.
If an object has multiple indices with the given name, ALL the corresponding indices will be selected for the operation.a function that takes as input
x
, and returns a logical vector, giving the element indices to select for the operation.
For atomic objects,i
is interpreted asi(x)
.
For recursive objects,i
is interpreted aslapply(x, i)
.
Using the i
arguments corresponds to doing something like the following:
If i
is a function, it corresponds to the following:
Argument Pair s, d
The s, d
argument pair, inspired by the
abind::
asub function from the 'abind' package,
is the primary indexing argument for sub-set operations on dimensional objects.
The s
argument specifies the
subscripts
(i.e. dimensional indices).
The d
argument gives the dimensions for which the
s
holds
(i.e. d
specifies the "non-missing" margins).
The d
argument must be an integer vector. s
must be an atomic vector, a list of length 1, or a list of the same length as d
.
If s
is a list of length 1,
it is internally recycled to become the same length as d
.
If s
is an atomic vector,
it is internally treated as list(s)
,
and (as with the previous case) recycled to become the same length as d
.
Each element of s
when s
is a list, or s
as a whole when s
is atomic,
can be any of the following:
a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers with indices of the specified dimension to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the corresponding dimension size, giving the indices of the specified dimension to select for the operation.
a character vector giving the
dimnames
to select.
If a dimension has multiple indices with the given name, ALL the corresponding indices will be selected for the operation.
Note the following:
As stated,
d
specifies which index margins are non-missing.
Ifd
is of length0
, it is taken as "all index margins are missing".The default value for
d
is1:
ndim(x)
.
To keep the syntax short,
the user can use the n function instead of list()
to specify s
.
EXAMPLES
Here are some examples for clarity,
using an atomic array x
of 3 dimensions:
ss_x(x, n(1:10, 1:5), c(1, 3))
extracts the first 10 rows, all columns, and the first 5 layers, of arrayx
.ss_x(x, n(1:10), 2)
extracts the first 10 columns of arrayx
.ss_x(x, 1:10)
,
extracts the first 10 rows, columns, and layers of arrayx
.ss_x(x, 1:10, c(1, 3))
,
extracts the first 10 rows, all columns, and the first 10 layers, of arrayx
.
I.e.:
ss_x(x, n(1:10, 1:5), c(1, 3)) # ==> x[1:10, , 1:5, drop = FALSE]
ss_x(x, 1:10, 2) # ==> x[ , 1:10, , drop = FALSE]
ss_x(x, 1:10) # ==> x[1:10, 1:10, 1:10, drop = FALSE]
ss_x(x, 1:10, c(1, 3)) # ==> x[1:10, , 1:10, drop = FALSE]
For a brief explanation of the relationship between flat indices (i
)
and subscripts (s
, d
) in arrays,
see sub2ind.
Argument Pair margin, slice
Relevant only for the idx method.
The margin
argument specifies the dimension on which argument slice
is used.
I.e. when margin = 1
, slice
selects rows;
when margin = 2
, slice
selects columns;
etc.
The slice
argument can be any of the following:
a numeric vector of strictly positive whole numbers with dimension indices to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the corresponding dimension size, giving the dimension indices to select for the operation.
a character vector of index names.
If a dimension has multiple indices with the given name, ALL the corresponding indices will be selected for the operation.
One could also give a vector of length 0
for slice
;
Argument slice
is only used in the idx method ,
and the result of idx are meant to be used inside the regular [
and [<-
operators.
Thus the effect of a zero-length index specification depends on the rule-set of
[.class(x)
and [<-.class(x)
.
Arguments obs, vars
The obs
argument specifies indices for observations (i.e. rows)
in data.frame-like objects.
The vars
argument specifies indices for variables (i.e. columns)
in data.frame-like objects.
The obs
and vars
arguments are inspired by the subset
and select
arguments, respectively, of base R's subset.data.frame
method.
However, the obs
and vars
arguments do not use
non-standard evaluation,
as to keep 'squarebrackets' fully programmatically friendly.
The obs
Argument
The obs
argument can be any of the following:
NULL
(default), corresponds to a missing argument.a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers with row indices to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the number of rows, giving the row indices to select for the operation.
a one-sided formula, with a single logical expression using the column names of the data.frame, giving the condition which observation/row indices should be selected for the operation.
So to perform an operation on the observations for which holds that height > 2
and sex != "female"
,
specify the following formula:
obs = ~ (height > 2) & (sex != "female")
If the formula is linked to an environment,
any variables not found in the data set will be searched from the environment.
The vars
Argument
The vars
argument can be any of the following
NULL
(default), corresponds to a missing argument.a vector of length 0, in which case no indices are selected for the operation (i.e. empty selection).
a numeric vector of strictly positive whole numbers with column indices to select for the operation.
a complex vector, as explained in squarebrackets_indx_fundamentals.
a logical vector of the same length as the number of columns, giving the column indices to select for the operation.
a character vector giving the
colnames
to select.
Note that 'squarebrackets' assumes data.frame-like objects have unique column names.a function that returns a logical vector, giving the column indices to select for the operation.
For example, to select all numeric variables, specifyvars = is.numeric
.a two-sided formula, where each side consists of a single term, giving a range of names to select.
For example, to select all variables between and including the variables "height" and "weight", specify the following:vars = heigth ~ weight
.
EXAMPLE
So using the obs, vars
arguments corresponds to doing something like the following:
ss2_x(x, obs = obs, vars = vars) # ==> subset(x, ...obs..., ...vars...)
Argument inv
Relevant for the _mod
,_set
,
and idx methods.
By default, inv = FALSE
, which translates the indices like normally.
When inv = TRUE
, the inverse of the indices is taken.
Consider, for example, an atomic matrix x
;
using ii_mod(x, 1:2, 2L, tf = tf)
corresponds to something like the following:
x[, 1:2] <- tf(x[, 1:2])
x
and using ss_mod(x, vars = 1:2, inv = TRUE, tf = tf)
corresponds to something like the following:
x[, -1:-2] <- tf(x[, -1:-2])
x
NOTE
The order in which the user gives indices when inv = TRUE
generally does not matter.
The order of the indices as they appear in the original object x
is maintained,
just like in base 'R'.
Therefore, when replacing multiple values where the order of the replacement matters,
it is better to keep inv = FALSE
, which is the default.
For replacement with a single value or with a transformation function,
inv = TRUE
can be used without considering the ordering.
All Missing Indices
NULL
in the indexing arguments corresponds to a missing argument.
For s, d
, specifying d
of length 0 also corresponds to all subscripts being missing.
Thus, for both the _x
and _wo
methods,
using missing or NULL
indexing arguments for all indexing arguments corresponds to something like the following:
x[]
Similarly, for the _mod
and _set
methods,
using missing or NULL
indexing arguments corresponds to something like the following:
x[] <- rp # for replacement
x[] <- tf(x) # for transformation
The above is true even if inv = TRUE
and/or red = TRUE
.
Disallowed Combinations of Index Arguments
One cannot specify the s, d
pair and obs, vars
pair simultaneously;
it's either one pair or the other pair.
One cannot specify the s, d
pair and slice, margin
pair simultaneously;
it's either one pair or the other pair.
In the above cases it holds that if one set is specified, the other is set is ignored.
Drop
Sub-setting with the generic methods from the 'squarebrackets' R-package using dimensional arguments
(s, d, row, col filter, vars
)
always use drop = FALSE
.
To drop potentially redundant (i.e. single level) dimensions,
use the drop function, like so:
References
Plate T, Heiberger R (2016). abind: Combine Multidimensional Arrays. R package version 1.4-5, https://CRAN.R-project.org/package=abind.