<- list(
x group1 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
),class2 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
),group2 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
),class2 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
) )
List Casting Overview
1 Introduction
Hierarchical data is surprisingly common, and are commonly represented in by nested lists.
Broadcasted operations can be performed over dimensions, but not through nesting or hierarchies.
Therefore, it is useful to be able to cast nested lists into dimensional lists.
The ‘broadcast’ package provides the cast_hier2dim() function, to cast nested lists into dimensional lists, and the cast_dim2hier() to cast dimensional lists back to nested lists.
Casting between nested and dimensional lists is not only useful for broadcasting, however.
Casting nested lists to dimensional lists has its own merits, as dimensional lists have some advantages over nested lists beside the broadcasting, such as the following:
- Performing sub-set operations on multiple recursive subsets (using the
[[
and[[<-
operators) requires a (potentially slow) loop, whereas multi-dimensional subsets (using operator forms like[..., ...]
and[..., ...]<-
) are vectorized and generally much faster. - Re-organizing dimensions of a recursive array is generally much easier, faster, and more straight-forward than re-organizing hierarchies of a nested list.
This Vignette gives an overview of the functions ‘broadcast’ provides to cast between nested and dimensional lists.
2 Cast Hierarchical List to Dimensional List
2.1 Introduction
The cast_hier2dim() function casts a nested list into a dimensional list.
This section gently introduces the properties of this function through a series of examples, where each subsequent example builds on the previous one.
Familiarity with nested lists and dimensional lists (i.e. arrays of type list
) is essential to follow these examples.
2.2 Example 1: Basics
For a first example, consider the following list:
Before actually casting x
into a dimensional list, one may want to know what the dimensions will become when casted as a dimensional list;
The hier2dim() function shows you that:
hier2dim(x)
#>
#> 3 2 2
It returns the dimensions c(3, 2, 2)
.
Let’s now cast x
as a dimensional list:
<- cast_hier2dim(x) # actually cast nested list into dimensional list
x2 print(x2)
#> , , 1
#>
#> [,1] [,2]
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#> [3,] character,10 character,10
#>
#> , , 2
#>
#> [,1] [,2]
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#> [3,] character,10 character,10
Using the default arguments, element x[[i]][[j]][[k]]
corresponds to element x2[k, j, i]
(for all i
, j
, and k
).
This can be changed, as will be shown in a later example.
As shown in the results above, cast_hier2dim() will obviously not preserve names.
It is trivially easy to set the dimnames
of x2
, using hiernames2dimnames() (available from version 0.1.5
):
dimnames(x2) <- hiernames2dimnames(x)
print(x2)
#> , , group1
#>
#> class1 class2
#> height numeric,10 numeric,10
#> weight numeric,10 numeric,10
#> sex character,10 character,10
#>
#> , , group2
#>
#> class1 class2
#> height numeric,10 numeric,10
#> weight numeric,10 numeric,10
#> sex character,10 character,10
There, the names are now correct.
As shown above, will display a dimensional list more compactly than a nested list.
Depending on the situation this may be either be desirable or undesirable.
One can print x2
less compactly without much effort by flattening it, using the cast_dim2flat() function.
We only need to see a portion of the list in detail, so let’s look at class1 from group 1 in the flattened form:
cast_dim2flat(x2[, 1, "group1", drop = FALSE])
#> $`['height', 'class1', 'group1']`
#> [1] 171.0768 170.1363 169.0081 169.1834 171.1317 170.1661 169.9788 171.1625
#> [9] 168.6572 169.6540
#>
#> $`['weight', 'class1', 'group1']`
#> [1] 82.08303 79.99590 79.96753 80.39208 79.86180 78.53419 81.04165 81.37061
#> [9] 80.51771 80.32829
#>
#> $`['sex', 'class1', 'group1']`
#> [1] "M" "M" "M" "F" "M" "M" "F" "F" "F" NA
Dimensional lists can be easier to work with than hierarchical lists.
Consider, for example, printing the height of the first class of every group in a list - let’s compare how to do this in a nested list vs a dimensional list.
With a nested list, doing this takes a slow, messy for-loop:
for(i in seq_along(x)) {
print(names(x)[i])
1]][["height"]] |> print() # slow for-loop, messy code
x[[i]][[
}#> [1] "group1"
#> [1] 171.0768 170.1363 169.0081 169.1834 171.1317 170.1661 169.9788 171.1625
#> [9] 168.6572 169.6540
#> [1] "group2"
#> [1] 169.8802 169.2645 170.1326 170.6925 169.8156 169.7065 170.8591 171.6005
#> [9] 169.9893 170.2781
With a dimensional list, the very same thing can be done with sleek, vectorized code; no messy loop needed:
"height", 1L, ] |> print()
x2[#> $group1
#> [1] 171.0768 170.1363 169.0081 169.1834 171.1317 170.1661 169.9788 171.1625
#> [9] 168.6572 169.6540
#>
#> $group2
#> [1] 169.8802 169.2645 170.1326 170.6925 169.8156 169.7065 170.8591 171.6005
#> [9] 169.9893 170.2781
"height", 1L, , drop = FALSE] |> cast_dim2flat() # same but more informative
x2[#> $`['height', 'class1', 'group1']`
#> [1] 171.0768 170.1363 169.0081 169.1834 171.1317 170.1661 169.9788 171.1625
#> [9] 168.6572 169.6540
#>
#> $`['height', 'class1', 'group2']`
#> [1] 169.8802 169.2645 170.1326 170.6925 169.8156 169.7065 170.8591 171.6005
#> [9] 169.9893 170.2781
It is also easier to re-arrange dimensions - for example using aperm()
- than it is to re-arrange hierarchies.
2.3 Example 2: Cast from outside to inside
In Example 1, the default arguments were used for cast_hier2dim().
One of these arguments is in2out
, which defaults to TRUE
.
Consider a nested list x
with a depth of 3, and a dimensional list X2
with 3 dimensions, where the relationship between x
and x2
can be expressed as x2 <- cast_hier2dim(x, ...)
.
Given this, the following can be stated about in2out
:
- If
in2out = TRUE
, which is the default and used in Example 1, elementx[[i]][[j]][[k]]
corresponds to elementx2[k, j, i]
(for alli
,j
, andk
).
- If
in2out = FALSE
, elementx[[i]][[j]][[k]]
corresponds to elementx2[i, j, k]
(for alli
,j
, andk
).
The default of in2out = TRUE
was chosen, because elements in subsequent rows are close to each other, while elements in subsequent layers (third dimension) are generally not close to each other, and the default of in2out = TRUE
attempts to retain that behaviour.
For this example, the same list will be used as in Example 1:
<- list(
x group1 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
),class2 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
),group2 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
),class2 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
) )
Let’s once again cast this list to a dimensional list, but this time use in2out = FALSE
:
hier2dim(x, in2out = FALSE) # check once again the dimensions
#>
#> 2 2 3
<- cast_hier2dim(x, in2out = FALSE) # actually cast nested list into dimensional list
x2 print(x2)
#> , , 1
#>
#> [,1] [,2]
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#>
#> , , 2
#>
#> [,1] [,2]
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#>
#> , , 3
#>
#> [,1] [,2]
#> [1,] character,10 character,10
#> [2,] character,10 character,10
x2
is the casted list. Since in2out = FALSE
, element x[[i]][[j]][[k]]
corresponds to element x2[i, j, k]
(for all i
, j
, and k
).
Once again it is trivially easy to set the dimnames
of x2
, using hiernames2dimnames() (available from version 0.1.5
):
# this time, in2out = FALSE
# so we go from the surface names to the deepest names
dimnames(x2) <- hiernames2dimnames(x, in2out = FALSE)
print(x2)
#> , , height
#>
#> class1 class2
#> group1 numeric,10 numeric,10
#> group2 numeric,10 numeric,10
#>
#> , , weight
#>
#> class1 class2
#> group1 numeric,10 numeric,10
#> group2 numeric,10 numeric,10
#>
#> , , sex
#>
#> class1 class2
#> group1 character,10 character,10
#> group2 character,10 character,10
There, the names are now correct.
One can print x2
less compactly without much effort by flattening it, again using the cast_dim2flat() function.
We only need to see a portion of the list in detail, so let’s look at class1 from group 1 in the flattened form:
cast_dim2flat(x2["group1", 1, , drop = FALSE])
#> $`['group1', 'class1', 'height']`
#> [1] 169.3362 168.6543 169.8892 168.3379 170.4712 170.3906 168.7521 170.3552
#> [9] 169.9272 168.5335
#>
#> $`['group1', 'class1', 'weight']`
#> [1] 81.12428 81.78492 81.36158 81.40730 79.84937 81.08899 80.04604 80.47231
#> [9] 79.43960 79.39841
#>
#> $`['group1', 'class1', 'sex']`
#> [1] "F" "F" NA "F" "M" "F" "F" NA "M" "F"
2.4 Example 3: Padding
For Example 3, we take the same list as before, but remove x$group1$class2
:
<- list(
x group1 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
),group2 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
),class2 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
) )
Let’s first check what dimensions it will get when casted using hier2dim():
hier2dim(x)
#> padding
#> 3 2 2
The dimensions are the same as in Example 1: c(3, 2, 2)
.
But notice the names of the output are different: the second element has the name “padding”; this indicates that some columns won’t have enough elements to completely fill the column, and so additional elements will be added as padding.
So let’s cast this list as dimensional:
<- cast_hier2dim(x)
x2 print(x2)
#> , , 1
#>
#> [,1] [,2]
#> [1,] numeric,10 NULL
#> [2,] numeric,10 NULL
#> [3,] character,10 NULL
#>
#> , , 2
#>
#> [,1] [,2]
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#> [3,] character,10 character,10
Subset x2[, 2, 1]
is filled with NULL
; this is the place where x$group1$class2
was in Example 1, but since it’s not there, we need to fill something.
To make it make obvious, let’s give the array proper dimnames
:
dimnames(x2) <- hiernames2dimnames(x)
print(x2)
#> , , group1
#>
#> class1 class2
#> height numeric,10 NULL
#> weight numeric,10 NULL
#> sex character,10 NULL
#>
#> , , group2
#>
#> class1 class2
#> height numeric,10 numeric,10
#> weight numeric,10 numeric,10
#> sex character,10 character,10
Again, element “class2” is missing from element “group1”, but not from “group2”, and so padded with NULL
when the list is casted as dimensional.
Sometimes, a different value than NULL
is desired for padding.
So let’s replace the padding value with something really obvious, using the padding
argument:
<- cast_hier2dim(x, padding = list(~ "this is padding!"))
x2 dimnames(x2) <- hiernames2dimnames(x)
print(x2)
#> , , group1
#>
#> class1 class2
#> height numeric,10 ~"this is padding!"
#> weight numeric,10 ~"this is padding!"
#> sex character,10 ~"this is padding!"
#>
#> , , group2
#>
#> class1 class2
#> height numeric,10 numeric,10
#> weight numeric,10 numeric,10
#> sex character,10 character,10
Once again, one can print or present x2
less compactly by flattening it:
cast_dim2flat(x2)
#> $`['height', 'class1', 'group1']`
#> [1] 170.2971 168.8136 171.0433 169.3725 169.9247 168.0068 168.7293 170.6662
#> [9] 170.1744 169.6784
#>
#> $`['weight', 'class1', 'group1']`
#> [1] 81.09891 81.32271 78.43897 78.09466 80.92402 79.22143 80.41081 78.86779
#> [9] 80.14396 81.39648
#>
#> $`['sex', 'class1', 'group1']`
#> [1] "M" "F" NA "F" "F" NA "F" "F" "F" "F"
#>
#> $`['height', 'class2', 'group1']`
#> ~"this is padding!"
#>
#> $`['weight', 'class2', 'group1']`
#> ~"this is padding!"
#>
#> $`['sex', 'class2', 'group1']`
#> ~"this is padding!"
#>
#> $`['height', 'class1', 'group2']`
#> [1] 171.1757 170.2359 170.6996 169.3819 170.9317 170.5781 170.9899 168.4734
#> [9] 171.1337 169.3934
#>
#> $`['weight', 'class1', 'group2']`
#> [1] 79.86487 79.45448 79.55878 81.72460 79.00086 80.66076 79.76096 79.51111
#> [9] 80.45790 82.22055
#>
#> $`['sex', 'class1', 'group2']`
#> [1] NA "F" "F" NA NA "F" "F" "M" "M" NA
#>
#> $`['height', 'class2', 'group2']`
#> [1] 170.5003 170.2711 171.1467 168.7967 168.3768 169.8891 171.5996 171.4211
#> [9] 169.0799 168.4599
#>
#> $`['weight', 'class2', 'group2']`
#> [1] 78.31181 77.97686 79.37646 80.07585 79.38497 79.77858 80.61944 81.24933
#> [9] 78.96805 79.07225
#>
#> $`['sex', 'class2', 'group2']`
#> [1] "F" "F" "F" "M" NA "M" "M" NA NA "F"
2.5 Example 4: Comparing in2out
with padding
In this example, the same nested list as from the previous example is used, to demonstrate the difference between in2out = TRUE
(which is the default), and in2out = FALSE
.
Consider first the original list again:
<- list(
x group1 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
),group2 = list(
class1 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
),class2 = list(
height = rnorm(10, 170),
weight = rnorm(10, 80),
sex = sample(c("M", "F", NA), 10, TRUE)
)
) )
On the left side the list is casted as dimensional using the default of in2out = TRUE
, with proper names assigned.
On the right side the list is casted as dimensional using in2out = FALSE
, again with proper names assigned.
<- cast_hier2dim(x)
x2 dimnames(x2) <- hiernames2dimnames(x)
print(x2)
#> , , group1
#>
#> class1 class2
#> height numeric,10 NULL
#> weight numeric,10 NULL
#> sex character,10 NULL
#>
#> , , group2
#>
#> class1 class2
#> height numeric,10 numeric,10
#> weight numeric,10 numeric,10
#> sex character,10 character,10
<- cast_hier2dim(x, in2out = FALSE)
x2 dimnames(x2) <- hiernames2dimnames(x, in2out = FALSE)
print(x2)
#> , , height
#>
#> class1 class2
#> group1 numeric,10 NULL
#> group2 numeric,10 numeric,10
#>
#> , , weight
#>
#> class1 class2
#> group1 numeric,10 NULL
#> group2 numeric,10 numeric,10
#>
#> , , sex
#>
#> class1 class2
#> group1 character,10 NULL
#> group2 character,10 character,10
3 Cast Dimensional List to Hierarchical list
‘broadcast’ provides the cast_dim2hier():
cast_dim2hier() takes a dimensional list (i.e. an array of type list
), and casts it to a nested list.
Consider the following recursive array as an example:
<- array(c(as.list(1:11), ~hello, as.list(month.abb)), c(4:2))
x dimnames(x) <- list(
1:4],
letters[1:3],
LETTERS[c("group1", "group2")
)print(x)
#> , , group1
#>
#> A B C
#> a 1 5 9
#> b 2 6 10
#> c 3 7 11
#> d 4 8 ~hello
#>
#> , , group2
#>
#> A B C
#> a "Jan" "May" "Sep"
#> b "Feb" "Jun" "Oct"
#> c "Mar" "Jul" "Nov"
#> d "Apr" "Aug" "Dec"
Like cast_hier2dim(), cast_dim2hier() also has the in2out
argument, which (again) defaults to TRUE
.
Let’s cast the above dimensional list to a nested list, and compare the results when using in2out = TRUE
(on the left) versus in2out = FALSE
(on the right):
<- cast_dim2hier(
x2 distr.names = TRUE
x,
)::tree(x2)
lobstr#> <list>
#> ├─group1: <list>
#> │ ├─A: <list>
#> │ │ ├─a: 1
#> │ │ ├─b: 2
#> │ │ ├─c: 3
#> │ │ └─d: 4
#> │ ├─B: <list>
#> │ │ ├─a: 5
#> │ │ ├─b: 6
#> │ │ ├─c: 7
#> │ │ └─d: 8
#> │ └─C: <list>
#> │ ├─a: 9
#> │ ├─b: 10
#> │ ├─c: 11
#> │ └─d: S3<formula> ~hello
#> └─group2: <list>
#> ├─A: <list>
#> │ ├─a: "Jan"
#> │ ├─b: "Feb"
#> │ ├─c: "Mar"
#> │ └─d: "Apr"
#> ├─B: <list>
#> │ ├─a: "May"
#> │ ├─b: "Jun"
#> │ ├─c: "Jul"
#> │ └─d: "Aug"
#> └─C: <list>
#> ├─a: "Sep"
#> ├─b: "Oct"
#> ├─c: "Nov"
#> └─d: "Dec"
<- cast_dim2hier(
x2 in2out = FALSE, distr.names = TRUE
x,
)::tree(x2)
lobstr#> <list>
#> ├─a: <list>
#> │ ├─A: <list>
#> │ │ ├─group1: 1
#> │ │ └─group2: "Jan"
#> │ ├─B: <list>
#> │ │ ├─group1: 5
#> │ │ └─group2: "May"
#> │ └─C: <list>
#> │ ├─group1: 9
#> │ └─group2: "Sep"
#> ├─b: <list>
#> │ ├─A: <list>
#> │ │ ├─group1: 2
#> │ │ └─group2: "Feb"
#> │ ├─B: <list>
#> │ │ ├─group1: 6
#> │ │ └─group2: "Jun"
#> │ └─C: <list>
#> │ ├─group1: 10
#> │ └─group2: "Oct"
#> ├─c: <list>
#> │ ├─A: <list>
#> │ │ ├─group1: 3
#> │ │ └─group2: "Mar"
#> │ ├─B: <list>
#> │ │ ├─group1: 7
#> │ │ └─group2: "Jul"
#> │ └─C: <list>
#> │ ├─group1: 11
#> │ └─group2: "Nov"
#> └─d: <list>
#> ├─A: <list>
#> │ ├─group1: 4
#> │ └─group2: "Apr"
#> ├─B: <list>
#> │ ├─group1: 8
#> │ └─group2: "Aug"
#> └─C: <list>
#> ├─group1: S3<formula> ~hello
#> └─group2: "Dec"
The added distr.names = TRUE
argument will distribute the dimnames
in a logical way over the nested elements.
4 Data Wrangling Example: Turning list inside out
The cast functions can be used to turn a list inside out.
Let’s start with the following list:
<- list(
x group1 = list(
class1 = list(
height = rnorm(5, 170) |> as.integer(),
weight = rnorm(5, 80) |> as.integer(),
sex = sample(c("M", "F", NA), 5, TRUE)
),class2 = list(
height = rnorm(5, 170) |> as.integer(),
weight = rnorm(5, 80) |> as.integer(),
sex = sample(c("M", "F", NA), 5, TRUE)
)
),group2 = list(
class1 = list(
height = rnorm(5, 170) |> as.integer(),
weight = rnorm(5, 80) |> as.integer(),
sex = sample(c("M", "F", NA), 5, TRUE)
),class2 = list(
height = rnorm(5, 170) |> as.integer(),
weight = rnorm(5, 80) |> as.integer(),
sex = sample(c("M", "F", NA), 5, TRUE)
)
) )
Turning this list inside out means manipulating this list such that height, weight and sex become the surface-level elements and the groups become the deepest levels.
This can be done fast & easy with ‘broadcast’, by casting the nested list to dimensional with in2out = TRUE
, and then casting the dimensional list back to nested using in2out = FALSE
, like so:
<- cast_hier2dim(x)
x2 dimnames(x2) <- hiernames2dimnames(x)
<- cast_dim2hier(x2, in2out = FALSE, distr.names = TRUE)
x3 ::tree(x3)
lobstr#> <list>
#> ├─height: <list>
#> │ ├─class1: <list>
#> │ │ ├─group1<int [5]>: 169, 167, 169, 168, 171
#> │ │ └─group2<int [5]>: 170, 172, 172, 170, 169
#> │ └─class2: <list>
#> │ ├─group1<int [5]>: 169, 170, 169, 168, 171
#> │ └─group2<int [5]>: 170, 170, 170, 168, 171
#> ├─weight: <list>
#> │ ├─class1: <list>
#> │ │ ├─group1<int [5]>: 80, 78, 79, 77, 79
#> │ │ └─group2<int [5]>: 79, 79, 80, 79, 77
#> │ └─class2: <list>
#> │ ├─group1<int [5]>: 80, 79, 80, 79, 80
#> │ └─group2<int [5]>: 81, 80, 81, 79, 78
#> └─sex: <list>
#> ├─class1: <list>
#> │ ├─group1<chr [5]>: "F", "NA", "M", "NA", "NA"
#> │ └─group2<chr [5]>: "F", "M", "F", "NA", "NA"
#> └─class2: <list>
#> ├─group1<chr [5]>: "F", "F", "M", "M", "F"
#> └─group2<chr [5]>: "NA", "M", "M", "F", "F"
Easy, right?