(x <- array(1:12, c(3, 4)))
#> [,1] [,2] [,3] [,4]
#> [1,] 1 4 7 10
#> [2,] 2 5 8 11
#> [3,] 3 6 9 12
(y <- array(1:4 * 100, c(1, 4)))
#> [,1] [,2] [,3] [,4]
#> [1,] 100 200 300 400
-package âbroadcastâ: Broadcasted Array Operations Like âNumPyâ
Introduction
Overview
âbroadcastâ is an efficient âCâ/âC++â - based package that, as the name suggests, performs âarray broadcastingâ (similar to broadcasting in the âNumpyâ module for âPythonâ).
In the context of operations involving 2 (or more) arrays, âbroadcastingâ refers to efficiently recycling array dimensions, without making copies.
This is considerably faster and more memory-efficient than Râs regular dimensions replication mechanism.
At its core, the âbroadcastâ package provides the following functionalities, all related to âbroadcastingâ (click on the đ to show or hide):
Broadcasted Infix Operators đ
Consider the arrays x and y:
Suppose one wishes to compute the element-wise addition of these 2 arrays.
As show in the tabs below, this cannot be done efficiently in base âRâ; it can be done fast and memory-efficiently with the âbroadcastâ package:
x + y
Error in x + y : non-conformable arrays
# You *could* do the following....
x + y[rep(1L, 3L),]
# ... but if x or y is very large:
Error: cannot allocate vector of sizebroadcaster(x) <- TRUE
broadcaster(y) <- TRUE
x + y
#> [,1] [,2] [,3] [,4]
#> [1,] 101 204 307 410
#> [2,] 102 205 308 411
#> [3,] 103 206 309 412
#> broadcasterâbroadcastâ supports a wide range of infix operators, including arithmetic-, relational-, Boolean- string- and bit-wise operators.
Broadcasted Array Binding đ
Using broadcasting, bind_array() from the âbroadcastâ package can bind arrays together in ways that cannot efficiently be done with rbind(), cbind(), or abind::abind(). Letâs consider these arrays:
(x <- array(1:12, c(3, 4)))
#> [,1] [,2] [,3] [,4]
#> [1,] 1 4 7 10
#> [2,] 2 5 8 11
#> [3,] 3 6 9 12
(y <- array(1:4 * 100, c(1, 4)))
#> [,1] [,2] [,3] [,4]
#> [1,] 100 200 300 400Suppose one wishes to column-bind these 2 arrays.
As show in the tabs below, this cannot be done efficiently in base âRâ; it can be done fast and memory-efficiently with the âbroadcastâ package:
cbind(x, y)
Error in cbind(x, y) :
number of rows of matrices must match (see arg 2)
# You *could* do the following....
cbind(x, y[rep(1L, 3L),])
# ... but if x or y is very large:
Error: cannot allocate vector of sizebind_array(list(x, y), along = 2L)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1 4 7 10 100 200 300 400
#> [2,] 2 5 8 11 100 200 300 400
#> [3,] 3 6 9 12 100 200 300 400bind_array() is also considerably faster and more memory efficient than abind(). See the benchmarks.
Broadcasted General Functions đ
The idea of broadcasted infix operations and broadcasted array binding has been generalized to also include bcapply() (a broadcasted apply-like function), bc_ifelse() (broadcasted version of ifelse()), bc_strrep() (broadcasted version of strrep()).
Casting Methods đ
Broadcast provides casting functions, that cast subset-groups of an array to a new dimension, cast nested lists to dimensional lists, and vice-versa.
These functions are useful for facilitating complex broadcasted operations, though they also have much merit beside broadcasting.
For example, you cannot broadcast through hierarchies of a list, but you can broadcast along dimensions. So suppose you have the following list:
x <- list(
student1 = list(
homework1 = sample(0:100, 5),
homework2 = sample(0:100, 5),
homework3 = sample(0:100, 5)
),
student2 = list(
homework1 = sample(0:100, 5),
homework2 = sample(0:100, 5),
homework3 = sample(0:100, 5)
),
student3 = list(
homework1 = sample(0:100, 5),
homework2 = sample(0:100, 5),
homework3 = sample(0:100, 5)
)
)Since all values in the list are numbers, you might want to turn this into a numeric array, to make mathematical computations and analyses on it easier.
This can be done with the âbroadcastâ package with the following steps. First, turn the nested list into a shallow (i.e. non-nested), dimensional list using cast_hier2dim():
x2 <- cast_hier2dim(x, in2out = FALSE, direction.names = 1L)
print(x2)
#> homework1 homework2 homework3
#> student1 integer,5 integer,5 integer,5
#> student2 integer,5 integer,5 integer,5
#> student3 integer,5 integer,5 integer,5Second, turn the shallow (i.e. non-nested), dimensional list into an atomic array using cast_shallow2atomic():
x3 <- cast_shallow2atomic(x2, 1L)
print(x3)
#> , , homework1
#>
#> student1 student2 student3
#> [1,] 67 6 73
#> [2,] 38 72 41
#> [3,] 0 78 37
#> [4,] 33 84 19
#> [5,] 86 36 27
#>
#> , , homework2
#>
#> student1 student2 student3
#> [1,] 42 88 19
#> [2,] 13 36 43
#> [3,] 81 33 86
#> [4,] 58 100 69
#> [5,] 50 43 39
#>
#> , , homework3
#>
#> student1 student2 student3
#> [1,] 96 78 43
#> [2,] 84 32 24
#> [3,] 20 83 69
#> [4,] 53 34 38
#> [5,] 73 69 50A few Linear Algebra Functions for Statistics đ
âbroadcastâ comes with a few linear algebra functions for statistics. For example, the sd_lc() function to compute the standard deviation of a linear combination of variables - regardless of what the distribution of the variables is.
The Quick-Start Guide can be found here.
Some Practical Examples of the âbroadcastâ package in action can be found here.
Why use âbroadcastâ
Efficiency
Broadcasting as implemented in the âbroadcastingâ package is about as fast as - and sometimes even faster than - NumPy.
The implementations in the âbroadcastâ package are also much faster and much more memory efficient than using base solutions like sweep().
Efficient programs use less energy and resources, and is thus better for the environment.
Benchmarks can be found in the âAboutâ section on the website.
Convenience
Have you ever been bothered by any of the following while programming in :
- Receiving the ânon-conformable arraysâ error message in a simple array operation, when it intuitively should work?
- Receiving the âcannot allocate vector of sizeâŚâ error message because unnecessarily allocated too much memory in array operations?
abind::abind()being too slow, or ruining the structure of recursive arrays?- The
sweep()andouter()functions being too slow or too limiting? - that there is no array analogy to
data.table::dcast()? - difficulties in handling deeply nested lists?
- that certain âNumpyâ operations have no equivalent operation in ?
If you answered âYESâ to any of the above, âbroadcastâ may be the - package for you.
Minimal Dependencies
Besides linking to âRcppâ, âbroadcastâ does not depend on, vendor, link to, include, or otherwise use any external libraries; âbroadcastâ was essentially made from scratch and can be installed out-of-the-box.
Not using external libraries brings a number of advantages:
- Avoid dependency hell.
- Avoid wasting time, memory and computing resources for translating between language structures.
- Ensure consistent behaviour with the rest of .
Tested
The âbroadcastâ package is frequently checked using a large suite of unit tests via the tinytest package. These tests have a coverage of over 90%. So the chance of a function from this package breaking completely is relatively low.
âbroadcastâ is still relatively new package, however, so (small) bugs are still very much possible. I encourage users who find bugs to report them promptly to the issues tab on the GitHub page, and I will fix them as soon as time permits.
Installation
install.packages("broadcast", type = "source")
Status
âbroadcastâ is now available on CRAN! đ
If you have any suggestions or feedback on the package, its documentation, or even the benchmarks, I encourage you to let me know (either as an Issue or a Discussion).
Iâm eager to read your input!
Documentation
The documentation in the âbroadcastâ website is divided into 3 main parts:
- Guides and Vignettes: contains the topic-oriented guides in the form of a few Vignettes.
- Reference Manual: contains the function-oriented reference manual.
- About: Contains the Acknowledgements, Change logs and License file. Here youâll also find some information regarding the relationship between âbroadcastâ and other packages/modules. Benchmarks can also be found here.