squarebrackets: Subset Methods as Alternatives to the Square Brackets Operators for Programming
Source:R/aaa00_squarebrackets_help.R
aaa00_squarebrackets_help.Rd
squarebrackets:
Subset Methods as Alternatives to the Square Brackets Operators for Programming.
'squarebrackets' provides subset methods
(supporting both atomic and recursive S3 classes)
that may be more convenient alternatives to the [
and [<-
operators,
whilst maintaining similar performance.
Some nice properties of these methods include, but are not limited to, the following.
The
[
and[<-
operators use different rule-sets for different data.frame-like types (data.frames, data.tables, tibbles, tidytables, etc.).
The 'squarebrackets' methods use the same rule-sets for the different data.frame-like types.Performing dimensional subset operations on an array using
[
and[<-
, requires a-priori knowledge on the number of dimensions the array has.
The 'squarebrackets' methods work on any arbitrary dimensions without requiring such prior knowledge.When selecting names with the
[
and[<-
operators, only the first occurrence of the names are selected in case of duplicate names.
The 'squarebrackets' methods always perform on all names in case of duplicates, not just the first.The
[[
and[[<-
operators allow operating on a recursive subset of a nested list.
But these only operate on a single recursive subset, and are not vectorized for multiple recursive subsets of a nested list at once.
'squarebrackets' provides a way to reshape a nested list into a recursive matrix, thereby allowing vectorized operations on recursive subsets of such a nested list.The
[<-
operator only supports copy-on-modify semantics for most classes.
The 'squarebrackets' methods provides explicit pass-by-reference and pass-by-value semantics, whilst still respecting things like binding-locks and mutability rules.'squarebrackets' supports index-less sub-set operations, which is more memory efficient (and better for the environment) for
long vectors
than sub-set operations using the[
and[<-
operators.
Goal
Among programming languages,
'R' has perhaps one of the most
flexible and comprehensive sub-setting functionality,
provided by the square brackets operators ([
, [<-
).
But in some situations the square brackets operators
are occasionally less than optimally convenient
The Goal of the 'squarebrackets' package
is not to replace the square-brackets operators,
but to provide alternative sub-setting methods and functions,
to be used in situations where the square bracket operators are inconvenient.
Supported Structures
'squarebrackets' only supports the most common S3 classes,
and only those that primarily use square brackets for sub-setting
(hence the name of the package).
'squarebrackets' supports the following structures:
basic
atomic
classes
(atomic vectors, matrices, and arrays).mutable_atomic classes
(mutable_atomic vectors, matrices, and arrays).basic list classes
(recursive vectors, matrices, and arrays).data.frame
(including the classestibble
,sf-data.frame
andsf-tibble
).data.table
(including the classestidytable
,sf-data.table
, andsf-tidytable
).
See squarebrackets_supported_structures for more details.
Sub-set Operation Methods & Binding Implementations
The main focus of this package is on its generic methods
and dimensional binding implementations.
Generic methods for atomic objects
start with sb_
.
Generic methods for recursive objects (list, data.frame, etc.)
start with sb2_
.
There is also the somewhat separate idx method,
which works on both recursive and non-recursive objects.
The binding implementations for dimensional objects
start with bind_
.
And finally there are the slice_
methods,
which (currently) only work on (mutable) atomic vectors.
Methods to access subsets (i.e. extract selection, or extract all except selection):
sb_wo, sb2_wo: return an object without the specified subset.
sb2_rec: access recursive subsets of lists.
slice_x, slice_wo: efficiently extract subset from long vector, or return long vector without subset.
Methods to modify subsets:
idx: translate given indices/subscripts, for the purpose of copy-on-modify substitution.
sb2_recin: replace, transform, remove, or add recursive subsets to a list, through R's default Copy-On-Modify semantics.
sb_mod, sb2_mod: return the object with modified (transformed or replaced) subsets.
Methods to rename a mutable object using pass-by-reference semantics.
sb_set, sb2_set: modify (transform or replace) subsets of a mutable object using pass-by-reference semantics.
slice_set: efficiently modify a (long) vector subset using pass-by-reference semantics.
Methods and binding implementations, to extend or re-arrange an object beyond its current size:
bind_: implementations for binding dimensional objects.
sb2_recin: replace, transform, remove, or add recursive subsets to a list, through R's default Copy-On-Modify semantics.
See squarebrackets_method_dispatch for more information on how 'squarebrackets'
uses its S3 Method dispatch.
Functions
Additional specialized sub-setting functions are provided:
lst_untree: unnest tree-like nested list into a recursive matrix, to speed-up vectorized sub-setting on recursive subsets of the list.
The dt_-functions to programmatically perform
data.table
-specific[
-operations, with the security measures provided by the 'squarebrackets' package.setapply: apply functions over mutable matrix margins using pass-by-reference semantics.
ma_setv: Find & Replace values in mutable_atomic objects using pass-by-reference semantics.
This is considerably faster and more memory efficient than using sb_set for this.
A couple of convenience functions, and helper functions for creating ranges, sequences, and indices
(often needed in sub-setting)
are provided:
currentBindings: list or lock all currently existing bindings that share the share the same address as the input variable.
ndims: Get the number of dimensions of an object.
sub2coord, coord2ind: Convert subscripts (array indices) to coordinates, coordinates to flat indices, and vice-versa.
match_all: Find all matches, of one vector in another, taking into account the order and any duplicate values of both vectors.
Computing indices:
idx_r to compute an integer index range.
idx_by to compute grouped indices.
idx_ord_-functions to compute ordered indices.
Overview Help Pages
Besides the website,
'squarebrackets' comes with several help pages
that can be accessed from within 'R'.
MAIN DOCUMENTATION:
squarebrackets_supported_structures:
lists the structures that are supported by 'squarebrackets', and explains some related terminology.squarebrackets_indx_fundamentals:
explains the essential fundamentals of the indexing forms in 'squarebrackets'.squarebrackets_indx_args:
explains the common indexing arguments used in the main S3 methods.squarebrackets_modify:
explains the essentials of modification in 'squarebrackets'squarebrackets_options:
lists and explains the options the user can specify in 'squarebrackets'.squarebrackets_method_dispatch:
gives details regarding the S3 method dispatch in 'squarebrackets'.
PASS-BY-REFERENCE DOCUMENTATION:
If you are not planning on using the pass-by-reference functionality in 'squarebrackets', you do not need to read the following help pages:
squarebrackets_PassByReference:
explains Pass-by-Reference semantics, and its important consequences.squarebrackets_coercion:
explains the difference in coercion rules between modification through Pass-by-Reference semantics and modification through copy (i.e. pass-by-value) for the supported mutable structures.
Properties Details
The alternative sub-setting methods and functions provided by 'squarebrackets' have the following properties:
Programmatically friendly:
Unlike base
[
, it's not required to know the number of dimensions of an array a-priori, to perform subset-operations on an array.Missing arguments can be filled with
NULL
, instead of using dark magic likebase::quote(expr = )
.No Non-standard evaluation.
Functions are pipe-friendly.
No (silent) vector recycling.
Extracting and removing subsets uses the same syntax.
Class consistent:
sub-setting of multi-dimensional objects by specifying dimensions (i.e. rows, columns, ...) use
drop = FALSE
.
So matrix in, matrix out.The methods deliver the same results for data.frames, data.tables, tibbles, and tidytables.
No longer does one have to re-learn the different brackets-based sub-setting rules for different types of data.frame-like objects.
Powered by the subclass agnostic 'C'-code from 'collapse' and 'data.table'.
Explicit copy semantics:
Sub-set operations that change its memory allocations, always return a modified (partial) copy of the object.
For sub-set operations that just change values in-place (similar to the
[<-
and[[<-
methods) the user can choose a method that modifies the object by reference, or choose a method that returns a (partial) copy.
Careful handling of names:
Sub-setting an object by index names returns ALL matches with the given names, not just the first.
Data.frame-like objects (see supported classes below) are forced to have unique column names.
Sub-setting arrays using
x[indx1, indx2, etc.]
will dropnames(x)
.
The methods from 'squarebrackets' will not dropnames(x)
.
Concise function and argument names.
Performance & Energy aware:
Despite the many checks performed, the functions are kept reasonably speedy, through the use of the 'Rcpp', 'collapse', and 'data.table' R-packages.
The functions were also made to be as memory efficient as reasonably possible, to lower the carbon footprint of this package.
References
The badges shown in the documentation of this R-package were made using the services of: https://shields.io/
Author
Author, Maintainer: Tony Wilkes tony_a_wilkes@outlook.com (ORCID)