: cannot allocate vector of size Error
Other benchmarks
Introduction
This page benchmarks some of the functions from ‘broadcast’ with some near-equivalent functions from base and other packages. The code is given here also.
Because all these benchmarks are within , not only speed but also memory usage is compared.
Why care about speed and memory usage?
Faster code means less waiting for the user, which is less annoying. Moreover, faster code is (generally speaking) more efficient code - i.e. you don’t have to run your computer overnight (or at least less hours) - which reduces electricity usage. Using less electricity is better for your wallet and better for the environment.
Code that uses less memory when run means less resources (i.e. less memory) is required. And using less memory reduces the risk of running out of usable memory; when you run out of usable memory, your code may stop running with the following error:
You may want to prevent that. And without more memory-efficient code you’ll be forced to buy more memory (which costs money).
Materials used
The ‘benchmark’ package was used for measuring speed and memory usage, and for producing the figures showing the results.
The benchmarks were all run on the same computer (processor: 12th Gen Intel(R) Core(TM) i5-12500H @ 2.50 GHz) with 32GB of RAM and running the Windows 11 OS (64 bit).
version 4.4.0 with ‘Rstudio’ version 2024.12.1 was used to run the code.
The various comparisons are split over several sections. The code used to run the benchmarks is given in each section, just before the results.
Base replication
Here replicating array dimensions using base is benchmarked against broadcasting.
The following code was used:
<- 450
n <- array(rnorm(10), c(1, n, 1))
x <- array(rnorm(10), c(n, 1, n))
y
gc()
<- bench::mark(
bm_base base = x[rep(1, n), , rep(1, n)] + y[, rep(1, n), ],
broadcast = bc.d(x, y, "+"),
min_iterations = 100
)summary(bm_base)
plot(bm_base)
And here are the results:
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 base 643ms 659ms 1.52 1.36GB 3.22
#> 2 broadcast 118ms 122ms 8.12 695.24MB 2.16
#> Loading required namespace: tidyr
‘broadcasting’ is 5 to 5.5 times (!!!) faster than replicating array dimensions, and uses approximately 2 times less memory.
sweep()
Here the sweep()
function from base is benchmarked against broadcasting.
The following code was used:
<- 2000
n <- matrix(rnorm(n), n, n)
x <- array(colMeans(x), c(1, n))
cm
gc()
<- bench::mark(
bm_sweep sweep = sweep(x, 2, cm, "=="),
broadcast = bc.d(x, cm, "=="),
min_iterations = 100
)summary(bm_sweep)
plot(bm_sweep)
And here are the results:
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 sweep 32.8ms 32.8ms 30.4 76.3MB 3045.
#> 2 broadcast 12.1ms 13.7ms 72.3 15.3MB 8.94
‘broadcasting’ is about 3 (!) times faster than using sweep()
, and uses approximately 5 (!!!) times less memory.
abind::abind()
In this section, the performance of the bind_array() function from ‘broadcast’ is compared to the performance of the abind()
function from the ‘abind’ package.
The following code was used:
<- 110L
n <- function(n) sample(letters, n, TRUE)
nms <- array(as.double(1:25), c(n, n, n))
x <- array(as.double(-1:-25), c(n, n, n))
y dimnames(x) <- lapply(dim(x), nms)
dimnames(y) <- lapply(dim(y), nms)
<- list(x, y, x)
input
gc()
<- bench::mark(
bm_abind abind = abind::abind(input, along = 2),
broadcast = bind_array(input, 2),
min_iterations = 100,
check = FALSE # because abind adds empty dimnames
)summary(bm_abind)
plot(bm_abind)
And here are the results:
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 abind 34.7ms 41.3ms 23.8 121.9MB 0.736
#> 2 broadcast 14.2ms 14.9ms 63.2 60.9MB 1.29
Clearly, the bind_array() function from ‘broadcast’ is about 2 to 3 times faster than the abind()
function from the ‘abind’ package. It is also about 2 times more memory efficient.
Rfast::Outer()
An outer computation is a special case of broadcasting, namely a broadcasting computation between a row-vector and a column-vector.
Base has the outer()
function to perform outer computations. But the outer()
function from base uses soo much memory, that no meaningful benchmark can be made against the methods provided by the ‘broadcast’ package.
The ‘Rfast’ package, however, provides a very fast and memory-efficient implementation of the outer()
function. It may be interesting how broadcasted operations hold up to the famously fast ‘Rfast’ package.
Here the outer-sum between a row-vector x
and column-vector y
(both have 9000 elements) is computed using Rfast::outer()
and broadcast::bc.d()
, and their speeds and memory consumption are compared.
The following code was used:
<- 9e3
n <- array(rnorm(10), c(1, n))
x <- array(rnorm(10), c(n, 1))
y
gc()
<- bench::mark(
bm_outer Rfast = Rfast::Outer(x, y, "+"),
broadcast = bc.d(x, y, "+"),
min_iterations = 100
)summary(bm_outer)
plot(bm_outer)
And here are the results:
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 Rfast 97.9ms 103ms 8.88 618MB 4.57
#> 2 broadcast 97.9ms 105ms 9.33 618MB 4.60
It seems that the implementations of ‘broadcast’ and the blazingly fast ‘Rfast’ package reach similar speeds and use the same amount of memory.
Note, however, that Rfast::Outer()
unfortunately only supports numeric vectors, and does not provide higher-dimensional broadcasting. ‘broadcast’, on the other hand, supports all atomic types as well as the list
recursive type, and supports arrays of any dimensions up to 16 dimensions.
%r+%
operator from ‘collapse’
The impressive ‘collapse’ package supports a large set of blazingly fast functions for a large variety of tasks. One of these is the x %r% v
operator. Given a matrix x
and a vector v
, x %r+% v
will add v
to every row of x
. Using this function in this way is equivalent to the bc.d() function, using a row-vector for v
.
Here these 2 approaches are benchmarked.
The code used was as follows:
<- 8e3
n <- matrix(rnorm(10), n, n)
x <- array(rnorm(10), c(1, n))
v
gc()
<- bench::mark(
bm_collapse_row collapse = x %r+% v,
broadcast = bc.d(x, v, "+"),
min_iterations = 100
)summary(bm_collapse_row)
plot(bm_collapse_row)
And here are the results:
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 collapse 94.1ms 96.4ms 9.71 488MB 3.24
#> 2 broadcast 97.1ms 114.4ms 8.85 488MB 2.95
The ‘collapse’ package is slightly faster than ‘broadcast’ in this case. This does show how super fast ‘collapse’ truly is.