Benchmarks with Numpy

Introduction

In the context of operations involving 2 (or more) arrays, “broadcasting” refers to recycling array dimensions without allocating additional memory, which is considerably faster and more memory-efficient than R’s regular dimensions replication mechanism.

Before the emergence of the ‘broadcast’ package, if users wished to employ broadcasting, they essentially had to use broadcasting as it existed in a different programming language. For example, they might use the broadcasting as available in the ‘Python’ module ‘Numpy’ (perhaps via the ‘reticulate’ package). Or perhaps they might use the ‘C++’ library ‘xtensor’ via the R-package of the same name (or an extension thereof, like ‘rray’).

With the emergence of the ‘broadcast’ package, users can now call broadcasted implementations without using external libraries, which spares the computing power needed for translating between object structures of different languages.

The “broadcasting” implementation in the ‘broadcast’ package is conceptually inspired by the broadcasting employed by the ‘Numpy’ module for ‘Python’ - though ‘broadcast’ does not use/borrow any of ‘Numpy’ source code. ‘Numpy’ might have the first implementation of broadcasting. More importantly, ‘Numpy’ is remarkably fast, and the ‘broadcast’ package aims to be somewhat comparably fast.

This page presents the comparisons in the speed of broadcasted operations, between ‘broadcast’ and ‘Numpy’. The operation that is compared is a simple, element-wise, broadcasted addition, given by the code x + y in the ‘Numpy’ module for ‘Python’, and bc.d(x, y, "+") in the ‘broadcast’ package.

Please bear in mind these are rough comparisons of speed. Since the comparisons involve 2 separate programming languages, and ‘Python’, “proper” speed comparison is rather difficult to do fairly.

Methodology

Difficulties in comparing with ‘Python’

Benchmarking a ‘Python’ code snippet in ‘Python’ using a ‘Python’ module, and benchmarking an code snippet in using an package, means mechanisms from different modules/packages are used for the benchmarking, and those 2 benchmarks might not use the exact same approach to measuring the execution time.

‘Python’ and are both languages that use garbage collections (GC). But GC really does mess up benchmarking. The way to circumvent this issue differs in ‘Python’ and . In ‘Python’, GC can temporarily be disabled. does not support this, so instead for benchmarks with heavy GC calls just had to be filtered out.

Due to the above (and other) considerations, any form of benchmarks between and ‘Python’ - including the ones given in this page - should be taken with a grain of salt.

The Set-Up

The operation that was bench-marked in this study, is the operation x + y in ‘Numpy’ and the equivalent bc.d(x, y, "+") in ‘broadcast’.
Here x and y were both decimal numeric arrays (type of 64 bit double in and 64 bit float in ‘Python’), and had the same number of dimensions.
This operation was run for pairs of arrays with different number of dimensions, going from 2 dimensional to 7 dimensional.
So we have x + y where both arrays were 2-dimensional (i.e. matrices), and x + y where both arrays were 3-dimensional, and so on up to 7-dimensional arrays.

The pairs of arrays are fully orthogonal (“orthogonal” in the sense as explained here), thus the maximum amount of broadcasting will be employed.
Given, for example, 4-dimensional arrays, the dimensions of x are (n, 1, n, 1) and the dimensions of y are (1, n, 1, n).
The value of n, so the size of each dimension, varied as follows:

For 2-dimensional arrays, n goes from 1250 to 9500, with step size 750.
For 3-dimensional arrays, n goes from 65 to 450, with step size 35.
For 4-dimensional arrays, n goes from 9 to 99, with step size 10.
For 5-dimensional arrays, n goes from 6 to 39, with step size 3.
For 6-dimensional arrays, n goes from 3 to 21, with step size 2.
For 7-dimensional arrays, n goes from 2 to 14, with step size 1.

These values n were chosen as follows. The maximum n was specified such that the broadcasted element-wise addition of x and y resulted in an array with between 90 to 100 million elements. The minimum n was chosen to be (approximately) one-seventh of the maximum n value. And the step size was set to a value such that the sequence had a length between 10 and 15

For each pair of arrays, the element-wise addition was computed using ‘broadcast’ and ‘Numpy’. This computation was repeated 100 times (though see some technical details about this in the next sub-section). The first 10 were then treated as warm-up iterations, and thus discarded, leaving 90 benchmarks. From these 90 benchmarks, the median, first quartile, and third quartiles were computed. After the benchmarks for a particular dimension, the garbage collector was called manually to clean up any lingering memory.

There are some caveats here, in order to keep the comparisons between ‘broadcast’ and ‘Numpy’ fair, and these caveats are explained in the next sub-section.

Keeping comparisons (somewhat) fair

To keep the comparisons between ‘broadcast’ and ‘Numpy’ fair, a number of measures have been taken.

Distributions of benchmarks tend to be heavily skewed. Therefore, the median measure (together with the quartiles) were taken. The median is also more stable than the mean in the face of outliers.

Garbage collection was disabled in Python. In , only benchmarks with no garbage collection calls and benchmarks with level 0 garbage collection calls were used. I feel this keeps the comparisons relatively fair (but it’s not perfect).

Since only benchmarks with no and level 0 garbage collection calls were used for , the benchmarks were run 200 times, and a check was performed that at least 100 benchmark measurements were kept before discarding the warm-up; 90 benchmarks after discarding the warm-up. If there were less than 90 (= 100 - 10) benchmarks for a particular computation, the benchmarks would be thrown away, and another attempt would be made at benchmarking (but this never happened).

has more support for missing values than ‘Numpy’, which also leads to a difference in speed. But both and ‘Numpy’ handle missing values equally in decimal numbers ( 64bit floats in Numpy and 64bit doubles in ), through the NaN construct. Therefore, only operations on decimal numbers are compared.

Operations like power (^) and division (/) need to handle special cases (like when the right-hand side of the operation is 0). The plus (+) operator, however, has no such special cases. Therefore, to keep things fair, the comparisons on this page only involve addition. Addition also has the nice property of being commutative, while subtraction is not (this avoids any discussion regarding the influence of the order of elements on the benchmarks in x + y versus y + x, if the order of arrays even has any influence on the benchmarks).

Resources and Code

The ‘benchmark’ package was used for measuring speed in , as this package can also be used to check and filter for garbage collector calls.

In ‘Python’, the time.perf_counter() function was used to accurately measure the time an operation takes. To ensure no time was wasted on printing the result in ‘Python’, the operation a + b was wrapped inside a function without a return statement.

The figures in the Results section were created using the ‘tinyplot’ package, to display the median, first quartile, and third quartile, of the computation times.

The benchmarks were all run on the same computer (processor: 12th Gen Intel(R) Core(TM) i5-12500H @ 2.50 GHz) with 32GB of RAM and running the Windows 11 OS (64 bit).

The code used to run the benchmarks can be found at the bottom of this page. version 4.4.0 was used to run the code, and ‘Python’ version 3.12.0 with ‘Numpy’ version 2.2.1 was used to run the ‘Python’ code.

The code was sourced from ‘RStudio’ (2024.12.1) using source(), from a freshly started computer. The ‘Python’ code was run via ‘Jupyter’, also from a freshly started computer.

Results

Figure 1: Benchmarks of the element-wise broadcasted addition of 2 decimal numeric arrays, comparing the code `x + y` in the ‘Numpy’ ‘Python’-module, against the code `bc.d(x, y,"+")` in the ‘broadcast’ ‘R’-package.
Both arrays are 2 - dimensional arrays.
The dimensions of `x` are {n, 1};
the dimensions of `y` are {1, n}.
Here, `n` is shown on the x-axis.
The y-axis shows the time (in ms) it took to run the code. The solid line gives the median time, and the shaded ribbon gives the first and third quartiles. The higher a value is on the y-axis, the more time it takes to run the code, the slower the code.

Figure 2: Benchmarks of the element-wise broadcasted addition of 2 decimal numeric arrays, comparing the code `x + y` in the ‘Numpy’ ‘Python’-module, against the code `bc.d(x, y,"+")` in the ‘broadcast’ ‘R’-package.
Both arrays are 3 - dimensional arrays.
The dimensions of `x` are {n, 1, n};
the dimensions of `y` are {1, n, 1}.
Here, `n` is shown on the x-axis.
The y-axis shows the time (in ms) it took to run the code. The solid line gives the median time, and the shaded ribbon gives the first and third quartiles. The higher a value is on the y-axis, the more time it takes to run the code, the slower the code.

Figure 3: Benchmarks of the element-wise broadcasted addition of 2 decimal numeric arrays, comparing the code `x + y` in the ‘Numpy’ ‘Python’-module, against the code `bc.d(x, y,"+")` in the ‘broadcast’ ‘R’-package.
Both arrays are 4 - dimensional arrays.
The dimensions of `x` are {n, 1, n, 1};
the dimensions of `y` are {1, n, 1, n}.
Here, `n` is shown on the x-axis.
The y-axis shows the time (in ms) it took to run the code. The solid line gives the median time, and the shaded ribbon gives the first and third quartiles. The higher a value is on the y-axis, the more time it takes to run the code, the slower the code.

Figure 4: Benchmarks of the element-wise broadcasted addition of 2 decimal numeric arrays, comparing the code `x + y` in the ‘Numpy’ ‘Python’-module, against the code `bc.d(x, y,"+")` in the ‘broadcast’ ‘R’-package.
Both arrays are 5 - dimensional arrays.
The dimensions of `x` are {n, 1, n, 1, n};
the dimensions of `y` are {1, n, 1, n, 1}.
Here, `n` is shown on the x-axis.
The y-axis shows the time (in ms) it took to run the code. The solid line gives the median time, and the shaded ribbon gives the first and third quartiles. The higher a value is on the y-axis, the more time it takes to run the code, the slower the code.

Figure 5: Benchmarks of the element-wise broadcasted addition of 2 decimal numeric arrays, comparing the code `x + y` in the ‘Numpy’ ‘Python’-module, against the code `bc.d(x, y,"+")` in the ‘broadcast’ ‘R’-package.
Both arrays are 6 - dimensional arrays.
The dimensions of `x` are {n, 1, n, 1, n, 1};
the dimensions of `y` are {1, n, 1, n, 1, n}.
Here, `n` is shown on the x-axis.
The y-axis shows the time (in ms) it took to run the code. The solid line gives the median time, and the shaded ribbon gives the first and third quartiles. The higher a value is on the y-axis, the more time it takes to run the code, the slower the code.

Figure 6: Benchmarks of the element-wise broadcasted addition of 2 decimal numeric arrays, comparing the code `x + y` in the ‘Numpy’ ‘Python’-module, against the code `bc.d(x, y,"+")` in the ‘broadcast’ ‘R’-package.
Both arrays are 7 - dimensional arrays.
The dimensions of `x` are {n, 1, n, 1, n, 1, n};
the dimensions of `y` are {1, n, 1, n, 1, n, 1}.
Here, `n` is shown on the x-axis.
The y-axis shows the time (in ms) it took to run the code. The solid line gives the median time, and the shaded ribbon gives the first and third quartiles. The higher a value is on the y-axis, the more time it takes to run the code, the slower the code.

Conclusion & Discussion

It appears that ‘broadcast’ is slightly faster than ‘Numpy’, though the differences in the computation times between ‘broadcast’ and ‘Numpy’ are rather small. It seems reasonable to conclude that, in general, ‘broadcast’ and ‘Numpy’ have somewhat similar speeds. It can also be observed that ‘broadcast’ has a bit more spread in its computation time than ‘Numpy’.

As stated earlier, comparing benchmarks between ‘R’ and ‘Python’ should be taken with a grain of salt. I am open for suggestions on how to improve the computation time comparisons between ‘broadcast’ and ‘Numpy’, and make them more fair (so feel free to post suggestions on the GitHub Discussions page).

The code

R code
Python code

# set-up ====
library(broadcast)
get_times <- function(obj) {
  
  # remove calls with > 0 GC:
  idx <- rowSums(obj$gc[[1L]][, 2:3]) == 0
  times <- obj$time[[1L]][idx]
  
  # exclude first 10 for warm up:
  times <- times[-1:-10]
  
  return(times)
}


# loop 2d ====
gc()
dimsizes <- seq(1250L, 9500L,  by = 750L)
print(dimsizes)
niter <- length(dimsizes)
median_bc <- q1_bc <- q3_bc <- vector("numeric", niter)
counter <- 1L

for(i in seq_along(dimsizes)) {
  print(i)
  n <- dimsizes[i]
  x.dims <- c(n, 1L)
  y.dims <- c(1L, n)
  a <- array(runif(100), x.dims)
  b <- array(runif(100), y.dims)
  
  res <- bench::mark(
    broadcast = bc.d(a, b, "+"),
    min_iterations = 200
  )
  bc_all <- get_times(res)
  if(length(bc_all) < 90) {
    stop("too few benchmarks for 'R'")
  }
  
  median_bc[counter] <- median(bc_all)
  
  q1_bc[counter] <- quantile(bc_all, 0.25)
  q3_bc[counter] <- quantile(bc_all, 0.75)
  
  counter <- counter + 1L
}

median_bc <- median_bc * 1000
q1_bc <- q1_bc * 1000
q3_bc <- q3_bc * 1000

save(
  dimsizes, median_bc, q1_bc, q3_bc,
  file = "benchmarks/bm_bc_2d.RData"
)



# loop 3d ====
gc()
dimsizes <- seq(65L, 450L,  by = 35L)
print(dimsizes)
niter <- length(dimsizes)
median_bc <- q1_bc <- q3_bc <- vector("numeric", niter)
counter <- 1L

for(i in seq_along(dimsizes)) {
  print(i)
  n <- dimsizes[i]
  x.dims <- rep(c(n, 1L), 2)[1:3]
  y.dims <- rep(c(1L, n), 2)[1:3]
  
  a <- array(runif(100), x.dims)
  b <- array(runif(100), y.dims)
  
  res <- bench::mark(
    broadcast = bc.d(a, b, "+"),
    min_iterations = 200
  )
  bc_all <- get_times(res)
  if(length(bc_all) < 90) {
    stop("too few benchmarks for 'R'")
  }
  
  median_bc[counter] <- median(bc_all)
  
  q1_bc[counter] <- quantile(bc_all, 0.25)
  q3_bc[counter] <- quantile(bc_all, 0.75)
  
  
  
  counter <- counter + 1L
}

median_bc <- median_bc * 1000
q1_bc <- q1_bc * 1000
q3_bc <- q3_bc * 1000

save(
  dimsizes, median_bc, q1_bc, q3_bc,
  file = "benchmarks/bm_bc_3d.RData"
)


# loop 4d ====
gc()
dimsizes <- seq(9L, 99L,  by = 10L)
print(dimsizes)
niter <- length(dimsizes)
median_bc <- q1_bc <- q3_bc <- vector("numeric", niter)
counter <- 1L

for(i in seq_along(dimsizes)) {
  print(i)
  n <- dimsizes[i]
  x.dims <- rep(c(n, 1L), 2)[1:4]
  y.dims <- rep(c(1L, n), 2)[1:4]
  
  a <- array(runif(100), x.dims)
  b <- array(runif(100), y.dims)
  
  res <- bench::mark(
    broadcast = bc.d(a, b, "+"),
    min_iterations = 200
  )
  bc_all <- get_times(res)
  if(length(bc_all) < 90) {
    stop("too few benchmarks for 'R'")
  }
  
  median_bc[counter] <- median(bc_all)
  
  q1_bc[counter] <- quantile(bc_all, 0.25)
  q3_bc[counter] <- quantile(bc_all, 0.75)
  
  
  
  counter <- counter + 1L
}

median_bc <- median_bc * 1000
q1_bc <- q1_bc * 1000
q3_bc <- q3_bc * 1000

save(
  dimsizes, median_bc, q1_bc, q3_bc,
  file = "benchmarks/bm_bc_4d.RData"
)




# loop 5d ====
gc()
dimsizes <- seq(6L, 39L,  by = 3L)
print(dimsizes)
niter <- length(dimsizes)
median_bc <- q1_bc <- q3_bc <- vector("numeric", niter)
counter <- 1L

for(i in seq_along(dimsizes)) {
  print(i)
  n <- dimsizes[i]
  x.dims <- rep(c(n, 1L), 3)[1:5]
  y.dims <- rep(c(1L, n), 3)[1:5]
  
  a <- array(runif(100), x.dims)
  b <- array(runif(100), y.dims)
  
  res <- bench::mark(
    broadcast = bc.d(a, b, "+"),
    min_iterations = 200
  )
  bc_all <- get_times(res)
  if(length(bc_all) < 90) {
    stop("too few benchmarks for 'R'")
  }
  
  median_bc[counter] <- median(bc_all)
  
  q1_bc[counter] <- quantile(bc_all, 0.25)
  q3_bc[counter] <- quantile(bc_all, 0.75)
  
  
  
  counter <- counter + 1L
}


median_bc <- median_bc * 1000
q1_bc <- q1_bc * 1000
q3_bc <- q3_bc * 1000

save(
  dimsizes, median_bc, q1_bc, q3_bc,
  file = "benchmarks/bm_bc_5d.RData"
)



# loop 6d ====
gc()
dimsizes <- seq(3L, 21L,  by = 2L)
print(dimsizes)
niter <- length(dimsizes)
median_bc <- q1_bc <- q3_bc <- vector("numeric", niter)
counter <- 1L

for(i in seq_along(dimsizes)) {
  print(i)
  n <- dimsizes[i]
  x.dims <- rep(c(n, 1L), 3)[1:6]
  y.dims <- rep(c(1L, n), 3)[1:6]
  
  a <- array(runif(100), x.dims)
  b <- array(runif(100), y.dims)
  
  res <- bench::mark(
    broadcast = bc.d(a, b, "+"),
    min_iterations = 200
  )
  bc_all <- get_times(res)
  if(length(bc_all) < 90) {
    stop("too few benchmarks for 'R'")
  }
  
  median_bc[counter] <- median(bc_all)
  
  q1_bc[counter] <- quantile(bc_all, 0.25)
  q3_bc[counter] <- quantile(bc_all, 0.75)
  
  
  
  counter <- counter + 1L
}

median_bc <- median_bc * 1000
q1_bc <- q1_bc * 1000
q3_bc <- q3_bc * 1000


save(
  dimsizes, median_bc, q1_bc, q3_bc,
  file = "benchmarks/bm_bc_6d.RData"
)


# loop 7d ====
gc()
dimsizes <- seq(2L, 14L,  by = 1L)
print(dimsizes)
niter <- length(dimsizes)
median_bc <- q1_bc <- q3_bc <- vector("numeric", niter)
counter <- 1L

for(i in seq_along(dimsizes)) {
  print(i)
  n <- dimsizes[i]
  x.dims <- rep(c(n, 1L), 4)[1:7]
  y.dims <- rep(c(1L, n), 4)[1:7]
  
  a <- array(runif(100), x.dims)
  b <- array(runif(100), y.dims)
  
  res <- bench::mark(
    broadcast = bc.d(a, b, "+"),
    min_iterations = 200
  )
  bc_all <- get_times(res)
  if(length(bc_all) < 90) {
    stop("too few benchmarks for 'R'")
  }
  
  median_bc[counter] <- median(bc_all)
  
  q1_bc[counter] <- quantile(bc_all, 0.25)
  q3_bc[counter] <- quantile(bc_all, 0.75)
  
  
  
  counter <- counter + 1L
}


median_bc <- median_bc * 1000
q1_bc <- q1_bc * 1000
q3_bc <- q3_bc * 1000

save(
  dimsizes, median_bc, q1_bc, q3_bc,
  file = "benchmarks/bm_bc_7d.RData"
)
gc()

# import sys
# !{sys.executable} -m pip install numpy

# set-up #
import numpy as np
import gc
from time import perf_counter
def myfunc(a, b):
    a + b

# end set-up #


# 2d array #
print("2d")
gc.disable()
dimsizes = np.arange(1250, 9501, 750)
median_np = np.zeros(len(dimsizes))
q1_np = np.zeros(len(dimsizes))
q3_np = np.zeros(len(dimsizes))
durations = np.zeros(100)

for i in range(0, len(dimsizes)):
    print(i)
    n = dimsizes[i]
    adims = (n, 1)
    bdims = (1, n)
    x = np.random.random_sample(adims)
    y = np.random.random_sample(bdims)
    
    for j in range(0, len(durations)):
        t1_start = perf_counter()
        myfunc(x, y)
        t1_stop = perf_counter()
        durations[j] = (t1_stop-t1_start) * 1000
    
    durations2 = durations[10:]
    median_np[i] = np.median(durations2)
    q1_np[i] = np.quantile(durations2, 0.25)
    q3_np[i] = np.quantile(durations2, 0.75)

gc.collect()

np.savetxt("benchmarks/bm_py_2d_median.txt", median_np)
np.savetxt("benchmarks/bm_py_2d_q1.txt", q1_np)
np.savetxt("benchmarks/bm_py_2d_q3.txt", q3_np)
# end 2d array #



# 3d array #
print("3d")
gc.disable()
dimsizes = np.arange(65, 451, 35)
median_np = np.zeros(len(dimsizes))
q1_np = np.zeros(len(dimsizes))
q3_np = np.zeros(len(dimsizes))
durations = np.zeros(100)

for i in range(0, len(dimsizes)):
    print(i)
    n = dimsizes[i]
    adims = (n, 1, n)
    bdims = (1, n, 1)
    x = np.random.random_sample(adims)
    y = np.random.random_sample(bdims)
    
    for j in range(0, len(durations)):
        t1_start = perf_counter()
        myfunc(x, y)
        t1_stop = perf_counter()
        durations[j] = (t1_stop-t1_start) * 1000
        
    durations2 = durations[10:]
    median_np[i] = np.median(durations2)
    q1_np[i] = np.quantile(durations2, 0.25)
    q3_np[i] = np.quantile(durations2, 0.75)

gc.collect()

np.savetxt("benchmarks/bm_py_3d_median.txt", median_np)
np.savetxt("benchmarks/bm_py_3d_q1.txt", q1_np)
np.savetxt("benchmarks/bm_py_3d_q3.txt", q3_np)
# end 3d array #




# 4d array #
print("4d")
gc.disable()
dimsizes = np.arange(9, 100, 10)
median_np = np.zeros(len(dimsizes))
q1_np = np.zeros(len(dimsizes))
q3_np = np.zeros(len(dimsizes))
durations = np.zeros(100)

for i in range(0, len(dimsizes)):
    print(i)
    n = dimsizes[i]
    adims = (n, 1, n, 1)
    bdims = (1, n, 1, n)
    x = np.random.random_sample(adims)
    y = np.random.random_sample(bdims)
    
    for j in range(0, len(durations)):
        t1_start = perf_counter()
        myfunc(x, y)
        t1_stop = perf_counter()
        durations[j] = (t1_stop-t1_start) * 1000
        
    durations2 = durations[10:]
    median_np[i] = np.median(durations2)
    q1_np[i] = np.quantile(durations2, 0.25)
    q3_np[i] = np.quantile(durations2, 0.75)

gc.collect()

np.savetxt("benchmarks/bm_py_4d_median.txt", median_np)
np.savetxt("benchmarks/bm_py_4d_q1.txt", q1_np)
np.savetxt("benchmarks/bm_py_4d_q3.txt", q3_np)
# end 4d array #


# 5d array #
print("5d")
gc.disable()
dimsizes = np.arange(6, 40, 3)
median_np = np.zeros(len(dimsizes))
q1_np = np.zeros(len(dimsizes))
q3_np = np.zeros(len(dimsizes))
durations = np.zeros(100)

for i in range(0, len(dimsizes)):
    print(i)
    n = dimsizes[i]
    adims = (n, 1, n, 1, n)
    bdims = (1, n, 1, n, 1)
    x = np.random.random_sample(adims)
    y = np.random.random_sample(bdims)
    
    for j in range(0, len(durations)):
        t1_start = perf_counter()
        myfunc(x, y)
        t1_stop = perf_counter()
        durations[j] = (t1_stop-t1_start) * 1000

    durations2 = durations[10:]
    median_np[i] = np.median(durations2)
    q1_np[i] = np.quantile(durations2, 0.25)
    q3_np[i] = np.quantile(durations2, 0.75)

gc.collect()

np.savetxt("benchmarks/bm_py_5d_median.txt", median_np)
np.savetxt("benchmarks/bm_py_5d_q1.txt", q1_np)
np.savetxt("benchmarks/bm_py_5d_q3.txt", q3_np)
# end 5d array #




# 6d array #
print("6d")
gc.disable()
dimsizes = np.arange(3, 22, 2)
median_np = np.zeros(len(dimsizes))
q1_np = np.zeros(len(dimsizes))
q3_np = np.zeros(len(dimsizes))
durations = np.zeros(100)

for i in range(0, len(dimsizes)):
    print(i)
    n = dimsizes[i]
    adims = (n, 1, n, 1, n, 1)
    bdims = (1, n, 1, n, 1, n)
    x = np.random.random_sample(adims)
    y = np.random.random_sample(bdims)

    for j in range(0, len(durations)):
        t1_start = perf_counter()
        myfunc(x, y)
        t1_stop = perf_counter()
        durations[j] = (t1_stop-t1_start) * 1000

    durations2 = durations[10:]
    median_np[i] = np.median(durations2)
    q1_np[i] = np.quantile(durations2, 0.25)
    q3_np[i] = np.quantile(durations2, 0.75)

gc.collect()

np.savetxt("benchmarks/bm_py_6d_median.txt", median_np)
np.savetxt("benchmarks/bm_py_6d_q1.txt", q1_np)
np.savetxt("benchmarks/bm_py_6d_q3.txt", q3_np)
# end 5d array #



# 7d array #
print("7d")
gc.disable()
dimsizes = np.arange(2, 15, 1)
median_np = np.zeros(len(dimsizes))
q1_np = np.zeros(len(dimsizes))
q3_np = np.zeros(len(dimsizes))
durations = np.zeros(100)

for i in range(0, len(dimsizes)):
    print(i)
    n = dimsizes[i]
    adims = (n, 1, n, 1, n, 1, n)
    bdims = (1, n, 1, n, 1, n, 1)
    x = np.random.random_sample(adims)
    y = np.random.random_sample(bdims)
    
    for j in range(0, len(durations)):
        t1_start = perf_counter()
        myfunc(x, y)
        t1_stop = perf_counter()
        durations[j] = (t1_stop-t1_start) * 1000

    durations2 = durations[10:]
    median_np[i] = np.median(durations2)
    q1_np[i] = np.quantile(durations2, 0.25)
    q3_np[i] = np.quantile(durations2, 0.75)

gc.collect()

np.savetxt("benchmarks/bm_py_7d_median.txt", median_np)
np.savetxt("benchmarks/bm_py_7d_q1.txt", q1_np)
np.savetxt("benchmarks/bm_py_7d_q3.txt", q3_np)
# end 7d array #