<- array(1:27, c(3,3,3))
x <- array(1L, c(3,3,1))
y ::abind(x, y, along = 2)
abind
#> Error in abind(x, y, along = 2) :
#> arg 'X2' has dims=3, 3, 1; but need dims=3, X, 3
Practical Applications
1 Introduction
Broadcasting comes up frequent enough in real world problems. This page gives a few examples of these.
2 Binding arrays along an arbitrary dimension
The abind()
function, from the package of the same name, allows one to bind arrays along any arbitrary dimensions (not just along rows or columns).
Unfortunately, abind()
does not support broadcasting, which can lead to frustrations such as the following:
Here, abind()
is complaining about the dimensions not fitting perfectly.
But intuitively, binding x
and y
should be possible, with dimension 3
from array y
being broadcasted to size 3.
The bind_array() function provided by the ‘broadcast’ package can bind the arrays without problems:
<- array(1:27, c(3,3,3))
x <- array(1L, c(3,3,1))
y bind_array(list(x, y), 2)
#> , , 1
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 4 7 1 1 1
#> [2,] 2 5 8 1 1 1
#> [3,] 3 6 9 1 1 1
#>
#> , , 2
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 10 13 16 1 1 1
#> [2,] 11 14 17 1 1 1
#> [3,] 12 15 18 1 1 1
#>
#> , , 3
#>
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 19 22 25 1 1 1
#> [2,] 20 23 26 1 1 1
#> [3,] 21 24 27 1 1 1
bind_array() is also considerably faster and more memory efficient than abind()
.
3 Array casting
3.1 Casting with equal group sizes
Suppose you read the following matrix from a file:
<- cbind(
x id = rep(1:3, each = 2),
grp = rep(1:2, 3),
val = 1:6 * 2
)print(x)
#> id grp val
#> [1,] 1 1 2
#> [2,] 1 2 4
#> [3,] 2 1 6
#> [4,] 2 2 8
#> [5,] 3 1 10
#> [6,] 3 2 12
For computing purposes, you may need the rows of each group - defined in column “grp” - to be cast to a new dimension.
‘broadcast’ allows users to cast subsets of an array onto a new dimension, based on some grouping factor. In this case, the following will do the job:
<- 1L # we cast from the rows, so margin = 1
margin <- as.factor(x[, 2]) # factor to define which rows belongs to which group
grp levels(grp) <- c("a", "b") # names for the new dimension
<- acast(x, margin, grp) # casting is performed here
out print(out)
#> , , a
#>
#> id grp val
#> [1,] 1 1 2
#> [2,] 2 1 6
#> [3,] 3 1 10
#>
#> , , b
#>
#> id grp val
#> [1,] 1 2 4
#> [2,] 2 2 8
#> [3,] 3 2 12
Notice that the dimension-names of the new dimension (dimension 3) are equal to levels(grp)
.
With the cast array, one can use broadcasting to easily do things like multiply the values in each group with a different value, like so:
# create the multiplication factor array
<- array(
mult 1, c(1, 3, 2),
list(NULL, c("mult_id", "mult_grp", "mult_val"), c("a", "b"))
)"mult_val", c("a", "b")] <- c(2, 10)
mult[, print(mult)
#> , , a
#>
#> mult_id mult_grp mult_val
#> [1,] 1 1 2
#>
#> , , b
#>
#> mult_id mult_grp mult_val
#> [1,] 1 1 10
<- bc.num(out, mult, "*")
out2 dimnames(out2) <- dimnames(out)
print(out2)
#> , , a
#>
#> id grp val
#> [1,] 1 1 4
#> [2,] 2 1 12
#> [3,] 3 1 20
#>
#> , , b
#>
#> id grp val
#> [1,] 1 2 40
#> [2,] 2 2 80
#> [3,] 3 2 120
Perhaps you’d like to reverse-cast the array back when you’re done computing; reverse-casting an array can be done be combining asplit()
with bind_array():
asplit(out2, ndim(out2)) |> bind_array(along = margin)
#> id grp val
#> a.1 1 1 4
#> a.2 2 1 12
#> a.3 3 1 20
#> b.1 1 2 40
#> b.2 2 2 80
#> b.3 3 2 120
…though the order of, in this case, the rows (because margin = 1
) will not necessarily be the same as the original array.
3.2 Casting with unequal group sizes
The casting arrays also works when the groups have unequal sizes, though there are a few things to keep in mind.
Let’s start again with the input array:
<- cbind(
x id = c(rep(1:3, each = 2), 1),
grp = c(rep(1:2, 3), 2),
val = rnorm(7)
)print(x)
#> id grp val
#> [1,] 1 1 -0.8136779
#> [2,] 1 2 -0.2471219
#> [3,] 2 1 -1.1611043
#> [4,] 2 2 1.1476509
#> [5,] 3 1 0.6848985
#> [6,] 3 2 0.7391061
#> [7,] 1 2 -0.3350584
Once again, the acast() function can be used to cast the group subsets from the rows over a new dimension. But this time, we need to specify fill = TRUE
to allow acast() to fill the gaps, otherwise an error is called.
Thus one can cast in this case like so:
<- as.factor(x[, 2])
grp levels(grp) <- c("a", "b")
<- 1L
margin <- acast(x, margin, grp, fill = TRUE)
out print(out)
#> , , a
#>
#> id grp val
#> [1,] 1 1 -0.8136779
#> [2,] 2 1 -1.1611043
#> [3,] 3 1 0.6848985
#> [4,] NA NA NA
#>
#> , , b
#>
#> id grp val
#> [1,] 1 2 -0.2471219
#> [2,] 2 2 1.1476509
#> [3,] 3 2 0.7391061
#> [4,] 1 2 -0.3350584
Notice that some values are missing ( NA
); if some groups have unequal number of elements, acast() needs to fill the gaps with missing values. By default, gaps are filled with NA
if x
is atomic, and with list(NULL)
if x
is recursive. The user can change the filling value through the fill_value
argument.
Once again, we can get the original array back when we’re done like so:
asplit(out, ndim(out)) |> bind_array(along = margin)
#> id grp val
#> a.1 1 1 -0.8136779
#> a.2 2 1 -1.1611043
#> a.3 3 1 0.6848985
#> a.4 NA NA NA
#> b.1 1 2 -0.2471219
#> b.2 2 2 1.1476509
#> b.3 3 2 0.7391061
#> b.4 1 2 -0.3350584
… but we do keep the missing values when the groups have an unequal number of elements.
4 Vector quantization
Here is an example taken from Numpy’s own online documentation.
The basic operation in Vector Quantization (VQ) finds the closest point in a set of points, called codes in VQ jargon, to a given point, called the observation. In the very simple, two-dimensional case shown below, the values in observation describe the weight and height of an athlete to be classified. The codes represent different classes of athletes. Finding the closest point requires calculating the distance between observation and each of the codes. The shortest distance provides the best match. In this example, codes[1]
is the closest class indicating that the athlete is likely a basketball player.
<- array(c(111.0, 188.0), dim = c(1, 2))
observation <- array(
codes c(102.0, 203.0,
132.0, 193.0,
45.0, 155.0,
57.0, 173.0),
dim = c(4, 2)
)
<- bc.num(codes, observation, "-") # broadcasting happens here
diff <- matrixStats::colSums2(diff^2) |> sqrt()
dist which.min(dist) |> print()
#> [1] 1
1] |> print()
codes[#> [1] 102
5 Perform computation on all possible pairs
Suppose you have 2 vectors of strings, and you want to find concatenate every possible pair strings.
In base R, this would require a either a loop (which is slow), or repeating the vectors several times (which requires more memory).
The ’broadcasted way to do this, is to make the vectors orthogonal, and concatenate the strings of the orthogonal vectors, using the following code:
<- array(letters[1:10], c(10, 1))
x <- array(letters[1:10], c(1, 10))
y
<- bc.str(x, y, "+")
out dimnames(out) <- list(x, y)
print(out)
#> a b c d e f g h i j
#> a "aa" "ab" "ac" "ad" "ae" "af" "ag" "ah" "ai" "aj"
#> b "ba" "bb" "bc" "bd" "be" "bf" "bg" "bh" "bi" "bj"
#> c "ca" "cb" "cc" "cd" "ce" "cf" "cg" "ch" "ci" "cj"
#> d "da" "db" "dc" "dd" "de" "df" "dg" "dh" "di" "dj"
#> e "ea" "eb" "ec" "ed" "ee" "ef" "eg" "eh" "ei" "ej"
#> f "fa" "fb" "fc" "fd" "fe" "ff" "fg" "fh" "fi" "fj"
#> g "ga" "gb" "gc" "gd" "ge" "gf" "gg" "gh" "gi" "gj"
#> h "ha" "hb" "hc" "hd" "he" "hf" "hg" "hh" "hi" "hj"
#> i "ia" "ib" "ic" "id" "ie" "if" "ig" "ih" "ii" "ij"
#> j "ja" "jb" "jc" "jd" "je" "jf" "jg" "jh" "ji" "jj"