Casting Explained

1 Introduction

Hierarchical data is surprisingly common, and are commonly represented in by nested lists.

Broadcasted operations can be performed over dimensions, but not through nesting or hierarchies.
Therefore, it is useful to be able to cast nested lists into dimensional lists.
The ‘broadcast’ package provides the cast_hier2dim() function, to cast nested lists into dimensional lists, and the cast_dim2hier() to cast dimensional lists back to nested lists.

Casting between nested and dimensional lists is not only useful for broadcasting, however.
Casting nested lists to dimensional lists has its own merits, as dimensional lists have some advantages over nested lists beside the broadcasting, such as the following:

  • Performing sub-set operations on multiple recursive subsets (using the [[ and [[<- operators) requires a (potentially slow) loop, whereas multi-dimensional subsets (using operator forms like [..., ...] and [..., ...]<-) are vectorized and generally much faster.
  • Re-organizing recursive arrays is generally much easier, faster, and more straight-forward than re-organizing nested lists.

This Vignette explains the functions ‘broadcast’ provides to cast between nested and dimensional lists.

 

2 Cast Hierarchical List to Dimensional List

2.1 Introduction

The cast_hier2dim() function casts a nested list into a dimensional list.
This section gently introduces the properties of this function through a series of examples, where the complexity of each subsequent example increases a bit.
Familiarity with nested lists and dimensional lists (i.e. arrays of type list) is essential to follow these examples.

 

2.2 Example 1: Basics

For a first example, consider the following list:

x <- list(
  group1 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  ),
  group2 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  )
)

Before actually casting x into a dimensional list, one may want to know what the dimensions will become when casted as a dimensional list;
The hier2dim() function shows you that:

hier2dim(x)
#>       
#> 3 2 2

It returns the dimensions c(3, 2, 2).

Let’s now cast x as a dimensional list:

x2 <- cast_hier2dim(x) # actually cast nested list into dimensional list
print(x2)
#> , , 1
#> 
#>      [,1]         [,2]        
#> [1,] numeric,10   numeric,10  
#> [2,] numeric,10   numeric,10  
#> [3,] character,10 character,10
#> 
#> , , 2
#> 
#>      [,1]         [,2]        
#> [1,] numeric,10   numeric,10  
#> [2,] numeric,10   numeric,10  
#> [3,] character,10 character,10

Using the default arguments, element x[[i]][[j]][[k]] corresponds to element x2[k, j, i] (for all i, j, and k).
This can be changed, as will be shown in a later example.

As shown in the results above, cast_hier2dim() will obviously not preserve names.

In this case, it is trivially easy to set the dimnames of x2:

# remember: element `x[[i]][[j]][[k]]` corresponds to element `x2[k, j, i]`
# so we go from the deepest names to the surface names
dimnames(x2) <- list(
  names(x[[1]][[1]]),
  names(x[[1]]),
  names(x)
)
print(x2)
#> , , group1
#> 
#>        class1       class2      
#> height numeric,10   numeric,10  
#> weight numeric,10   numeric,10  
#> sex    character,10 character,10
#> 
#> , , group2
#> 
#>        class1       class2      
#> height numeric,10   numeric,10  
#> weight numeric,10   numeric,10  
#> sex    character,10 character,10

There, the names are now correct.

As shown above, will display a dimensional list more compactly than a nested list.
Depending on the situation this may be either be desirable or undesirable.

One can print x2 less compactly without much effort by flattening it, using the cast_dim2flat() function.
We only need to see a portion of the list in detail, so let’s look at class1 from group 1 in the flattened form:

cast_dim2flat(x2[, 1, "group1", drop = FALSE])
#> $`['height', 'class1', 'group1']`
#>  [1] 169.3892 170.4896 169.7610 169.7262 171.0470 171.2289 170.6705 169.6205
#>  [9] 169.9288 169.5261
#> 
#> $`['weight', 'class1', 'group1']`
#>  [1] 80.85426 82.29471 80.09630 79.52578 78.47134 81.00325 79.39270 78.86466
#>  [9] 80.90112 80.96140
#> 
#> $`['sex', 'class1', 'group1']`
#>  [1] NA  "M" NA  NA  NA  "M" NA  NA  "M" "M"

 

2.3 Example 2: Cast from outside to inside

In Example 1, the default arguments were used for cast_hier2dim().
One of these arguments is in2out, which defaults to TRUE.

Consider a nested list x with a depth of 3, and a dimensional list X2 with 3 dimensions, where the relationship between x and x2 can be expressed as x2 <- cast_hier2dim(x, ...).
Given this, the following can be stated about in2out:

  • If in2out = TRUE, which is the default and used in Example 1, element x[[i]][[j]][[k]] corresponds to element x2[k, j, i] (for all i, j, and k).
  • If in2out = FALSE, element x[[i]][[j]][[k]] corresponds to element x2[i, j, k] (for all i, j, and k).

The default of in2out = TRUE was chosen, because elements in subsequent rows are close to each other, while elements in subsequent layers (third dimension) are generally not close to each other, and the default of in2out = TRUE attempts to retain that behaviour.

For this example, the same list will be used as in Example 1:

x <- list(
  group1 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  ),
  group2 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  )
)

Let’s once again cast this list to a dimensional list, but this time use in2out = FALSE:

hier2dim(x, in2out = FALSE) # check once again the dimensions
#>       
#> 2 2 3
x2 <- cast_hier2dim(x, in2out = FALSE) # actually cast nested list into dimensional list
print(x2)
#> , , 1
#> 
#>      [,1]       [,2]      
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#> 
#> , , 2
#> 
#>      [,1]       [,2]      
#> [1,] numeric,10 numeric,10
#> [2,] numeric,10 numeric,10
#> 
#> , , 3
#> 
#>      [,1]         [,2]        
#> [1,] character,10 character,10
#> [2,] character,10 character,10

x2 is the casted list. Since in2out = FALSE, element x[[i]][[j]][[k]] corresponds to element x2[i, j, k] (for all i, j, and k).

Once again the dimnames need to be set manually:

# this time, in2out = FALSE
# so we go from the surface names to the deepest names
dimnames(x2) <- list(
  names(x),
  names(x[[1]]),
  names(x[[1]][[1]])
)
print(x2)
#> , , height
#> 
#>        class1     class2    
#> group1 numeric,10 numeric,10
#> group2 numeric,10 numeric,10
#> 
#> , , weight
#> 
#>        class1     class2    
#> group1 numeric,10 numeric,10
#> group2 numeric,10 numeric,10
#> 
#> , , sex
#> 
#>        class1       class2      
#> group1 character,10 character,10
#> group2 character,10 character,10

There, the names are now correct.

One can print x2 less compactly without much effort by flattening it, again using the cast_dim2flat() function.
We only need to see a portion of the list in detail, so let’s look at class1 from group 1 in the flattened form:

cast_dim2flat(x2["group1", 1, , drop = FALSE])
#> $`['group1', 'class1', 'height']`
#>  [1] 169.7509 169.9332 169.3597 170.5977 169.9556 170.5791 170.6486 170.6238
#>  [9] 170.5733 170.0844
#> 
#> $`['group1', 'class1', 'weight']`
#>  [1] 78.66834 78.97904 79.25864 81.28301 80.70020 79.89842 79.11619 79.28468
#>  [9] 80.27843 79.88116
#> 
#> $`['group1', 'class1', 'sex']`
#>  [1] "F" "M" "M" "F" "F" "M" "M" "F" NA  "M"

 

2.4 Example 3: Padding

For Example 3, we take the same list as before, but remove x$group1$class2:

x <- list(
  group1 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  ),
  group2 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  )
)

Let’s first check what dimensions it will get when casted using hier2dim():

hier2dim(x)
#>         padding         
#>       3       2       2

The dimensions are the same as in Example 1: c(3, 2, 2).
But notice the names of the output are different: the second element has the name “padding”; this indicates that some columns won’t have enough elements to completely fill the column, and so additional elements will be added as padding.

So let’s cast this list as dimensional:

x2 <- cast_hier2dim(x)
print(x2)
#> , , 1
#> 
#>      [,1]         [,2]
#> [1,] numeric,10   NULL
#> [2,] numeric,10   NULL
#> [3,] character,10 NULL
#> 
#> , , 2
#> 
#>      [,1]         [,2]        
#> [1,] numeric,10   numeric,10  
#> [2,] numeric,10   numeric,10  
#> [3,] character,10 character,10

Subset x2[, 2, 1] is filled with NULL; this is the place where x$group1$class2 was in Example 1, but since it’s not there, we need to fill something.

To make it make obvious, let’s give the array proper dimnames:

dimnames(x2) <- list(
  c("height", "weight", "sex"),
  c("class1", "class2"),
  c("group1", "group2")
)
print(x2)
#> , , group1
#> 
#>        class1       class2
#> height numeric,10   NULL  
#> weight numeric,10   NULL  
#> sex    character,10 NULL  
#> 
#> , , group2
#> 
#>        class1       class2      
#> height numeric,10   numeric,10  
#> weight numeric,10   numeric,10  
#> sex    character,10 character,10

Again, element “class2” is missing from element “group1”, but not from “group2”, and so padded with NULL when the list is casted as dimensional.

Sometimes, a different value than NULL is desired for padding.
So let’s replace the padding value with something really obvious, using the padding argument:

x2 <- cast_hier2dim(x, padding = list(~ "this is padding!"))
dimnames(x2) <- list(
  c("height", "weight", "sex"),
  c("class1", "class2"),
  c("group1", "group2")
)
print(x2)
#> , , group1
#> 
#>        class1       class2             
#> height numeric,10   ~"this is padding!"
#> weight numeric,10   ~"this is padding!"
#> sex    character,10 ~"this is padding!"
#> 
#> , , group2
#> 
#>        class1       class2      
#> height numeric,10   numeric,10  
#> weight numeric,10   numeric,10  
#> sex    character,10 character,10

Once again, one can print or present x2 less compactly by flattening it:

cast_dim2flat(x2)
#> $`['height', 'class1', 'group1']`
#>  [1] 168.2179 170.9156 170.9254 168.9508 168.2129 171.4698 169.2185 169.5054
#>  [9] 168.9004 168.2410
#> 
#> $`['weight', 'class1', 'group1']`
#>  [1] 77.98068 78.99395 78.73490 80.64577 78.37037 80.99324 80.58340 79.56303
#>  [9] 80.69950 80.91804
#> 
#> $`['sex', 'class1', 'group1']`
#>  [1] "M" "M" "M" "M" NA  "F" "F" "F" NA  NA 
#> 
#> $`['height', 'class2', 'group1']`
#> ~"this is padding!"
#> 
#> $`['weight', 'class2', 'group1']`
#> ~"this is padding!"
#> 
#> $`['sex', 'class2', 'group1']`
#> ~"this is padding!"
#> 
#> $`['height', 'class1', 'group2']`
#>  [1] 170.4472 169.6473 169.9302 170.1636 169.9023 171.2588 169.2644 168.7934
#>  [9] 169.4998 169.0340
#> 
#> $`['weight', 'class1', 'group2']`
#>  [1] 80.39548 80.67082 80.14162 80.43888 80.88588 80.78235 81.17882 79.52955
#>  [9] 81.82502 78.17481
#> 
#> $`['sex', 'class1', 'group2']`
#>  [1] "F" NA  "M" NA  "M" "F" NA  "F" "M" "F"
#> 
#> $`['height', 'class2', 'group2']`
#>  [1] 169.1512 171.0605 171.1082 169.8170 171.7047 171.3833 169.5000 170.8287
#>  [9] 168.5446 167.9806
#> 
#> $`['weight', 'class2', 'group2']`
#>  [1] 79.25940 80.05236 80.82860 80.94232 80.62797 80.16816 80.07743 80.36594
#>  [9] 81.38249 81.35905
#> 
#> $`['sex', 'class2', 'group2']`
#>  [1] NA  "M" "M" NA  "F" NA  NA  "F" NA  "F"

 

2.5 Example 4: Comparing in2out with padding

In this example, the same nested list as from the previous example is used, to demonstrate the difference between in2out = TRUE (which is the default), and in2out = FALSE.

On the left side the original nested list is shown.
On the right side, the list casted as dimensional list (using the default in2out = TRUE), with the proper names assigned, is shown:

x <- list(
  group1 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  ),
  group2 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  )
)
x2 <- cast_hier2dim(x)
dimnames(x2) <- list(
  c("height", "weight", "sex"),
  c("class1", "class2"),
  c("group1", "group2")
)
print(x2)
#> , , group1
#> 
#>        class1       class2
#> height numeric,10   NULL  
#> weight numeric,10   NULL  
#> sex    character,10 NULL  
#> 
#> , , group2
#> 
#>        class1       class2      
#> height numeric,10   numeric,10  
#> weight numeric,10   numeric,10  
#> sex    character,10 character,10

 

Now we do the same, but with in2out = FALSE.
On the left side the original nested list is shown.
On the right side, the list casted as dimensional list - using in2out = FALSE, with the proper names assigned, is shown:

x <- list(
  group1 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  ),
  group2 = list(
    class1 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    ),
    class2 = list(
      height = rnorm(10, 170),
      weight = rnorm(10, 80),
      sex = sample(c("M", "F", NA), 10, TRUE)
    )
  )
)
x2 <- cast_hier2dim(x, in2out = FALSE)
dimnames(x2) <- list(
  c("group1", "group2"),
  c("class1", "class2"),
  c("height", "weight", "sex")
)
print(x2)
#> , , height
#> 
#>        class1     class2    
#> group1 numeric,10 NULL      
#> group2 numeric,10 numeric,10
#> 
#> , , weight
#> 
#>        class1     class2    
#> group1 numeric,10 NULL      
#> group2 numeric,10 numeric,10
#> 
#> , , sex
#> 
#>        class1       class2      
#> group1 character,10 NULL        
#> group2 character,10 character,10

 

3 Cast Dimensional List to Hierarchical list

‘broadcast’ provides the cast_dim2hier():
cast_dim2hier() takes a dimensional list (i.e. an array of type list), and casts it to a nested list.

Consider the following recursive array as an example:


x <- array(c(as.list(1:11), ~hello, as.list(month.abb)), c(4:2))
dimnames(x) <- list(
  letters[1:4],
  LETTERS[1:3],
  c("group1", "group2")
)
print(x)
#> , , group1
#> 
#>   A B C     
#> a 1 5 9     
#> b 2 6 10    
#> c 3 7 11    
#> d 4 8 ~hello
#> 
#> , , group2
#> 
#>   A     B     C    
#> a "Jan" "May" "Sep"
#> b "Feb" "Jun" "Oct"
#> c "Mar" "Jul" "Nov"
#> d "Apr" "Aug" "Dec"

Like cast_hier2dim(), cast_dim2hier() also has the in2out argument, which (again) defaults to TRUE.
Let’s cast the above dimensional list to a nested list, and compare the results when using in2out = TRUE (on the left) versus in2out = FALSE (on the right):


x2 <- cast_dim2hier(
  x, distr.names = TRUE
)
lobstr::tree(x2)
#> <list>
#> ├─group1: <list>
#> │ ├─A: <list>
#> │ │ ├─a: 1
#> │ │ ├─b: 2
#> │ │ ├─c: 3
#> │ │ └─d: 4
#> │ ├─B: <list>
#> │ │ ├─a: 5
#> │ │ ├─b: 6
#> │ │ ├─c: 7
#> │ │ └─d: 8
#> │ └─C: <list>
#> │   ├─a: 9
#> │   ├─b: 10
#> │   ├─c: 11
#> │   └─d: S3<formula> ~hello
#> └─group2: <list>
#>   ├─A: <list>
#>   │ ├─a: "Jan"
#>   │ ├─b: "Feb"
#>   │ ├─c: "Mar"
#>   │ └─d: "Apr"
#>   ├─B: <list>
#>   │ ├─a: "May"
#>   │ ├─b: "Jun"
#>   │ ├─c: "Jul"
#>   │ └─d: "Aug"
#>   └─C: <list>
#>     ├─a: "Sep"
#>     ├─b: "Oct"
#>     ├─c: "Nov"
#>     └─d: "Dec"

x2 <- cast_dim2hier(
  x, in2out = FALSE, distr.names = TRUE
)
lobstr::tree(x2)
#> <list>
#> ├─a: <list>
#> │ ├─A: <list>
#> │ │ ├─group1: 1
#> │ │ └─group2: "Jan"
#> │ ├─B: <list>
#> │ │ ├─group1: 5
#> │ │ └─group2: "May"
#> │ └─C: <list>
#> │   ├─group1: 9
#> │   └─group2: "Sep"
#> ├─b: <list>
#> │ ├─A: <list>
#> │ │ ├─group1: 2
#> │ │ └─group2: "Feb"
#> │ ├─B: <list>
#> │ │ ├─group1: 6
#> │ │ └─group2: "Jun"
#> │ └─C: <list>
#> │   ├─group1: 10
#> │   └─group2: "Oct"
#> ├─c: <list>
#> │ ├─A: <list>
#> │ │ ├─group1: 3
#> │ │ └─group2: "Mar"
#> │ ├─B: <list>
#> │ │ ├─group1: 7
#> │ │ └─group2: "Jul"
#> │ └─C: <list>
#> │   ├─group1: 11
#> │   └─group2: "Nov"
#> └─d: <list>
#>   ├─A: <list>
#>   │ ├─group1: 4
#>   │ └─group2: "Apr"
#>   ├─B: <list>
#>   │ ├─group1: 8
#>   │ └─group2: "Aug"
#>   └─C: <list>
#>     ├─group1: S3<formula> ~hello
#>     └─group2: "Dec"

The added distr.names = TRUE argument will distribute the dimnames in a logical way over the nested elements.