library("broadcast")
# variances ====
vc <- datasets::ability.cov$cov
X <- matrix(rnorm(100), 100, ncol(vc))
solve(vc)
#> general picture blocks maze reading
#> general 0.082259001 -0.0312406436 -7.750932e-03 -0.013309494 -2.061705e-02
#> picture -0.031240644 0.2369906996 -2.484938e-02 0.017844845 8.603286e-04
#> blocks -0.007750932 -0.0248493822 1.344272e-02 -0.012544830 -3.802671e-05
#> maze -0.013309494 0.0178448450 -1.254483e-02 0.101625400 5.508423e-03
#> reading -0.020617049 0.0008603286 -3.802671e-05 0.005508423 5.713620e-02
#> vocab -0.002420800 0.0019394999 -1.157864e-03 -0.002857265 -2.406969e-02
#> vocab
#> general -0.002420800
#> picture 0.001939500
#> blocks -0.001157864
#> maze -0.002857265
#> reading -0.024069692
#> vocab 0.020323179
cinv(vc) # faster than `solve()`, but only works on positive definite matrices
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.082259001 -0.0312406436 -7.750932e-03 -0.013309494 -2.061705e-02
#> [2,] -0.031240644 0.2369906996 -2.484938e-02 0.017844845 8.603286e-04
#> [3,] -0.007750932 -0.0248493822 1.344272e-02 -0.012544830 -3.802671e-05
#> [4,] -0.013309494 0.0178448450 -1.254483e-02 0.101625400 5.508423e-03
#> [5,] -0.020617049 0.0008603286 -3.802671e-05 0.005508423 5.713620e-02
#> [6,] -0.002420800 0.0019394999 -1.157864e-03 -0.002857265 -2.406969e-02
#> [,6]
#> [1,] -0.002420800
#> [2,] 0.001939500
#> [3,] -0.001157864
#> [4,] -0.002857265
#> [5,] -0.024069692
#> [6,] 0.020323179
all(round(solve(vc), 6) == round(cinv(vc), 6)) # they're the same
#> [1] TRUE
sd_lc(X, vc)
#> [1] 75.1359211 27.4386473 50.7331146 13.3192297 33.3566175 2.8891812
#> [7] 20.5471208 21.2868429 3.3790892 24.1724684 20.2989638 2.9785054
#> [13] 27.5281840 17.7735136 11.9377368 22.5238926 34.0795253 4.4239893
#> [19] 61.2691719 19.6008464 9.3567278 21.3942806 6.4672993 0.1958693
#> [25] 27.0894076 37.4187359 3.3006852 23.3661104 3.8497865 7.2027153
#> [31] 11.9908184 51.4312903 19.1933210 14.5627906 37.7366011 19.8011184
#> [37] 48.5855620 64.4155905 6.8994013 13.2092404 13.6456784 20.1236365
#> [43] 7.4302461 16.6826254 17.1137022 10.0682649 68.1391616 30.4080246
#> [49] 11.7730557 39.2535910 3.7998623 12.8598549 20.6669218 107.5367669
#> [55] 43.2500954 31.3910870 12.4629448 17.0173415 14.8722451 15.1473380
#> [61] 7.5268708 0.6173182 12.9086581 31.7074580 4.3694738 10.8935361
#> [67] 10.3933230 24.1452663 2.2048462 4.3607478 27.2460422 18.6467387
#> [73] 9.4688651 32.4696183 12.5998757 9.3846803 33.6487541 10.6138200
#> [79] 27.3808021 42.2958340 37.7224107 49.4096424 10.0020110 15.3562417
#> [85] 44.7684748 39.8387444 59.6049253 13.1797595 1.1564148 9.1536823
#> [91] 21.4401950 6.3147613 12.7942113 10.6055748 55.9793081 8.1745395
#> [97] 42.2829607 51.5504603 5.7884400 0.1824216
# ecumprob() ====
sim <- rnbinom(10 * 1e4, mu = 3, size = 2) |> matrix(10, 1e4)
y <- sample(0:9)
# vector:
pnbinom(y[1], mu = 3, size = 2) # real probability
#> [1] 0.8936243
ecumprob(y[1], sim[1, , drop = TRUE]) # approximation
#> [1] 0.9001
# matrix:
cbind(
real = pnbinom(y, mu = 3, size = 2), # real probability
approx = ecumprob(y, sim) # approximation
)
#> real approx
#> [1,] 0.8936243 0.9001
#> [2,] 0.1600000 0.1609
#> [3,] 0.9536426 0.9518
#> [4,] 0.5248000 0.5268
#> [5,] 0.6630400 0.6621
#> [6,] 0.9697669 0.9703
#> [7,] 0.7667200 0.7692
#> [8,] 0.8413696 0.8466
#> [9,] 0.9294561 0.9295
#> [10,] 0.3520000 0.3502
# data.frame:
cbind(
real = pnbinom(y, mu = 3, size = 2), # real probability
approx = ecumprob(y, as.data.frame(sim)) # approximation
)
#> real approx
#> [1,] 0.8936243 0.9001
#> [2,] 0.1600000 0.1609
#> [3,] 0.9536426 0.9518
#> [4,] 0.5248000 0.5268
#> [5,] 0.6630400 0.6621
#> [6,] 0.9697669 0.9703
#> [7,] 0.7667200 0.7692
#> [8,] 0.8413696 0.8466
#> [9,] 0.9294561 0.9295
#> [10,] 0.3520000 0.3502linear_algebra_stats
Simple Linear Algebra Functions for Statistics
Description
โbroadcastโ provides some simple Linear Algebra Functions for Statistics:
cinv()
sd_lc()
ecumprob()
Usage
cinv(x)
sd_lc(X, vc, bad_rp = NaN)
ecumprob(y, sim, eps = 0)
Arguments
x
|
a real symmetric positive-definite square matrix. |
X
|
a numeric (or logical) matrix of multipliers/constants |
vc
|
the variance-covariance matrix for the (correlated) random variables. |
bad_rp
|
if vc is not a Positive (semi-) Definite matrix, give here the value to replace bad standard deviations with. |
y
|
values to estimate the cumulative probability for. |
sim
|
a matrix (or data.frame) with at least 500 columns of simulated values. If sim is given as a dimensionless vector, it will be treated as a matrix with 1 row and length(sim) columns, and this will be noted with a message.
|
eps
|
a non-negative numeric scaler smaller than 0.1, giving the cut-off value for probabilities. Probabilities smaller than eps will be replaced with eps, and probabilities larger than 1 - eps will be replaced with 1 - eps. Set eps = 0 to disable probability trimming.
|
Details
cinv()
cinv() computes the Choleski inverse of a real symmetric positive-definite square matrix.
sd_lc()
Given the linear combination X %*% b, where:
-
Xis a matrix of multipliers/constants; -
bis a vector of (correlated) random variables; -
vcis the symmetric variance-covariance matrix forb;
sd_lc(X, vc) computes the standard deviations for the linear combination X %*% b, without making needless copies.
sd_lc(X, vc) will use much less memory than a base โRโ approach.
sd_lc(X, vc) will usually be faster than a base โRโ approach (depending on the Linear Algebra Library used for base โRโ).
ecumprob()
The ecumprod(y, sim) function takes a matrix (or data.frame) of simulated values sim, and for each row i (after broadcasting), estimates the cumulative distribution function of sim[i, ], and returns the cumulative probability for y[i].
In terms of statistics, it is equivalent to the following operation for each index i:
ecdf(sim[i,])(y[i])
However, ecumprob() is much faster, and supports NAs/NaNs.
In terms of linear algebra, it is equivalent to the following broadcasted operation:
rowMeans(sim <= y)
where y and sim are broadcaster arrays.
However, ecumprob() is much more memory-efficient, supports a data.frame for sim, and has statistical safety checks.
Value
For cinv():
A matrix.
For sd_lc():
A vector of standard deviations.
For ecumprob():
A vector of cumulative probabilities.
If for any observation i (after broadcasting,) y[i] is NA/NaN or any of sim[i,] is NA/NaN, the result for i will be NA.
If zero-length y or sim is given, a zero-length numeric vector is returned.
References
John A. Rice (2007), Mathematical Statistics and Data Analysis (6th Edition)
See Also
chol, chol2inv