Quickstart Guide

1 Prerequisites

First, a basic understanding of is important.
A very, very basic refresher for can be found in the Getting Started in R: Tinyverse Edition document.

 

2 Installation

To install ‘broadcast’ from CRAN, one may run the following code in :


install.packages("broadcast")

 

3 Broadcasting

3.1 Introduction

In the context of operations involving 2 (or more) arrays, “broadcasting” refers to recycling array dimensions without allocating additional memory, which is considerably faster and more memory-efficient than ’s regular dimensions replication mechanism.

 

3.2 Example

Consider the matrices x and y:

x <- array(1:20, c(4, 5))
y <- array(1:5 * 100, c(1, 5))
print(x)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    5    9   13   17
#> [2,]    2    6   10   14   18
#> [3,]    3    7   11   15   19
#> [4,]    4    8   12   16   20
print(y)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  100  200  300  400  500

Suppose one wishes to compute the element-wise addition of these 2 arrays.

This won’t work in base :

x + y
Error in x + y : non-conformable arrays

You could do the following….

x + y[rep(1L, 4L),]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  101  205  309  413  517
#> [2,]  102  206  310  414  518
#> [3,]  103  207  311  415  519
#> [4,]  104  208  312  416  520

… but if x and/or y is very large, it will be slow and may even lead to an error:

Error: cannot allocate vector of size

The ‘broadcast’ package performs “broadcasting”, which can do the above, but faster, without unnecessary copies, and scalable to arrays of any size (up to 16 dimensions).

Like so:


broadcaster(x) <- TRUE
broadcaster(y) <- TRUE

x + y
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  101  205  309  413  517
#> [2,]  102  206  310  414  518
#> [3,]  103  207  311  415  519
#> [4,]  104  208  312  416  520
#> broadcaster

 

3.3 Rules

To paraphrase Numpy’s own documentation, one can summarise how broadcasting behaves using 2 rules.

  1. if the input arrays for an operation do not have the same number of dimensions, a 1 will be repeatedly appended to the end of the dimension of the smaller array, until the arrays have the same number of dimensions.

  2. arrays with a size of 1 for a particular dimension act as if they had the size of the array with the largest size for that dimension. This is done by virtually recycling said dimension without making copies.

After application of these 2 broadcasting rules, the sizes of the input arrays must match.

 

4 Broadcasted Infix Operators

Base comes with relational (==, !=, etc.), arithmetic (+, -, *, /, etc.), logical (&, |) and bit-wise (&, |) operators. ‘broadcast’ provides 2 ways to use these operators with broadcasting.

The first (and simple) way is to use the broadcaster() class, which comes with it’s own method dispatch for the above mentioned operators. This approach supports operator precedence, and for the average user, this is sufficient.

For example:


x <- array(1:20, c(4, 5))
y <- array(1:5 * 100, c(1, 5))
z <- array(20:1, c(4, 5))

broadcaster(x) <- TRUE
broadcaster(y) <- TRUE
broadcaster(z) <- TRUE

x + y / z
#>          [,1]     [,2]     [,3]     [,4]     [,5]
#> [1,] 6.000000 17.50000 34.00000 63.00000 142.0000
#> [2,] 7.263158 19.33333 37.27273 71.14286 184.6667
#> [3,] 8.555556 21.28571 41.00000 81.66667 269.0000
#> [4,] 9.882353 23.38462 45.33333 96.00000 520.0000
#> broadcaster

The second way is to use the large set ofbc. - functions. These offer much greater control and more operators than the previous method, and has less risk of running into conflicting methods. But it does not support operator precedence.

For example:

x <- array(1:20, c(4, 5))
y <- array(1:5 * 100, c(1, 5))
print(x)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    5    9   13   17
#> [2,]    2    6   10   14   18
#> [3,]    3    7   11   15   19
#> [4,]    4    8   12   16   20
print(y)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]  100  200  300  400  500

bc.i(x, y, "gcd") # calculate greatest common divisor between x and y
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    5    3    1    1
#> [2,]    2    2   10    2    2
#> [3,]    1    1    1    5    1
#> [4,]    4    8   12   16   20

 

5 Array binding

The battle-tested abind() function is often used to bind arrays along any arbitrary dimension.

‘broadcast’ provides an alternative to this, namely the bind_array() function, which allows for broadcasting (obviously), and is also notably faster and more memory efficient than abind().

Consider the following arrays:


x <- array(1:20, c(4, 5))
y <- array(1:5*10, c(1, 5))
print(x)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    5    9   13   17
#> [2,]    2    6   10   14   18
#> [3,]    3    7   11   15   19
#> [4,]    4    8   12   16   20
print(y)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]   10   20   30   40   50

Binding them together with abind() won’t work:

abind::abind(x, y, along = 2)
Error in abind::abind(x, y, along = 2) : 
  arg 'X2' has dims=1, 5; but need dims=4, X

To bind x and y together along columns, y needs its single row to be recycled (broadcasted) 4 times.

This can be done in a highly efficient way using bind_array(), like so:

bind_array(list(x, y), 2L)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,]    1    5    9   13   17   10   20   30   40    50
#> [2,]    2    6   10   14   18   10   20   30   40    50
#> [3,]    3    7   11   15   19   10   20   30   40    50
#> [4,]    4    8   12   16   20   10   20   30   40    50

 

6 Casting

6.1 Overview

‘broadcast’ provides several “casting” functions. These can facility complex forms of broadcasting that would normally not be possible.
But these “casting” functions also have their own merit, beside empowering complex broadcasting.

The following casting functions are currently available:

  • acast(): casts group-based subset of an array into a new dimension.
    Useful for, for example, computing grouped broadcasted operations.

  • cast_hier2dim(): casts a nested/hierarchical list into a dimensional list (i.e. recursive array).
    Useful because one cannot broadcast through nesting, but one can broadcast along dimensions.

  • cast_dim2hier(): casts a dimensional list into a nested/hierarchical list; the opposite of cast_hier2dim.

  • cast_dim2flat(): casts a dimensional list into a flattened list, but with names that indicate their original dimensional positions.
    Mostly useful for printing or summarizing dimensional lists.

  • dropnests(): drop redundant nesting in lists; mostly used for facilitating the above casting functions.

 

6.2 Example

As an example, consider the following nested list:

x <- list(
  group1 = list(
    class1 = list(
      height = rnorm(5, 170) |> as.integer(),
      weight = rnorm(5, 80)  |> as.integer(),
      sex = sample(c("M", "F", NA), 5, TRUE)
    ),
    class2 = list(
      height = rnorm(5, 170)  |> as.integer(),
      weight = rnorm(5, 80) |> as.integer(),
      sex = sample(c("M", "F", NA), 5, TRUE)
    )
  ),
  group2 = list(
    class1 = list(
      height = rnorm(5, 170) |> as.integer(),
      weight = rnorm(5, 80) |> as.integer(),
      sex = sample(c("M", "F", NA), 5, TRUE)
    ),
    class2 = list(
      height = rnorm(5, 170) |> as.integer(),
      weight = rnorm(5, 80) |> as.integer(),
      sex = sample(c("M", "F", NA), 5, TRUE)
    )
  )
)

One can turn the nested list into a dimensional list using cast_hier2dim() like so:

x2 <- cast_hier2dim(x)
dimnames(x2) <- hiernames2dimnames(x)
print(x2)
#> , , group1
#> 
#>        class1      class2     
#> height integer,5   integer,5  
#> weight integer,5   integer,5  
#> sex    character,5 character,5
#> 
#> , , group2
#> 
#>        class1      class2     
#> height integer,5   integer,5  
#> weight integer,5   integer,5  
#> sex    character,5 character,5

And turn it back into a dimensional list using cast_dim2hier() like so:

x3 <- cast_dim2hier(x2, distr.names = TRUE)
lobstr::tree(x3)
#> <list>
#> ├─group1: <list>
#> │ ├─class1: <list>
#> │ │ ├─height<int [5]>: 169, 170, 169, 170, 168
#> │ │ ├─weight<int [5]>: 81, 79, 80, 82, 81
#> │ │ └─sex<chr [5]>: "M", "M", "F", "NA", "M"
#> │ └─class2: <list>
#> │   ├─height<int [5]>: 170, 169, 169, 170, 169
#> │   ├─weight<int [5]>: 79, 77, 82, 78, 79
#> │   └─sex<chr [5]>: "NA", "M", "F", "F", "NA"
#> └─group2: <list>
#>   ├─class1: <list>
#>   │ ├─height<int [5]>: 170, 171, 171, 171, 170
#>   │ ├─weight<int [5]>: 78, 79, 82, 79, 81
#>   │ └─sex<chr [5]>: "F", "NA", "F", "F", "NA"
#>   └─class2: <list>
#>     ├─height<int [5]>: 169, 171, 170, 170, 168
#>     ├─weight<int [5]>: 77, 79, 81, 80, 80
#>     └─sex<chr [5]>: "NA", "NA", "NA", "F", "M"

 

7 Other Functions

‘broadcast’ provides the bcapply() function, which is a broadcasted apply-like function that applies a function between 2 arrays with broadcasting.

‘broadcast’ also provides the bc_ifelse() function, which is a broadcasted version of ifelse().

‘broadcast’ provides a small set of simple linear algebra functions for usage in statistics. See linear_algebra_stats.

‘broadcast’ offers type-casting functions. Unlike base ‘s type-casting functions (as.logical(), as.integer(), etc.), the type-casting functions from ’broadcast’ preserve names and dimensions. See typecast.

 

8 Why use ‘broadcast’?

To keep it very short:

  • using broadcasting is several times faster and uses several times less memory than using base alternative approaches like using sweep(), outer(), or manual replication of dimensions.
    See the benchmarks.
  • ‘broadcast’ provides bind_array(), which is an abind()-like function that can bind arrays along any arbitrary dimension.
    The difference with abind() is that bind_array() supports broadcasting, supports recursive arrays, and is faster & more memory-efficient.
  • ‘broadcast’ can perform (admittedly niche but still useful) forms of manipulations on arrays and hierarchical lists not found (to my knowledge) in other packages.