Looping simultaneously over multiple lists in legacy shells

I was recently confronted to the problem of looping over multiple lists simultaneously in a legacy shell, and coming up with an elegant solution was an interesting challenge. Legacy shells do not support arrays, which means there is no structure you can directly address with an index i in order to get the iᵗʰ element. How can one then process multiple lists simultaneously in order to get the first element of each list, then the second element of each list, then the third... and so on?

A python analogy

In python, there is a neat buitin called zip. Essentially, zip takes an arbitrary set of lists as an argument, and returns an iterator of tuples, such that the first tuple is composed of the first element of each list, the second tuple has the second element of each list, and so on. It always goes better with an example:

>>> for t in zip(['a', 'b', 'c', 'd'], [1, 2, 3, 4], ['!', '@', '#', '%']):
...     print(t)
...
('a', 1, '!')
('b', 2, '@')
('c', 3, '#')
('d', 4, '%')

The question is, how can we produce a similar function in a legacy shell?

Let's specify

Our goal is to produce a shell function that we will also conveniently name zip. Of course, this function must work in legacy shells such as ash or dash. It will expect an unspecified number of lists as arguments, and return a flattened list of tuples, where the iᵗʰ tuple is composed of all of the iᵗʰ elements of each list. Like in python, if the lists provided as arguments are not of equal lengths, the result will be cut at the shortest length. Here are examples of expected behaviour:

# 3 lists of same length
$ zip "a b c d" "1 2 3 4" "! @ # %"
 a 1 !
 b 2 @
 c 3 #
 d 4 %

# 1 list
$ zip "a b c d"
 a
 b
 c
 d

# 2 lists of different length
$ zip "a b" "1 2 3"
 a 1
 b 2

# 0 list
$ zip

How to use it would then just be a matter of using the set builtin:

$ list1="1 2 3 4"
$ list2="a b c d"
$ set $(zip "$list1" "$list2")
$ for i in $(seq 1 4); do \
    echo "processing $1 and $2"; \
    shift 2; \
done
processing 1 and a
processing 2 and b
processing 3 and c
processing 4 and d

The implementation

It is often not obvious to the beginner how to create variable names out of other variables. For example, the name list$i is an illegal variable name, so you cannot do:

i=1
list$i="a b c"

What is the way to get around that? The answer is eval, of course!

i=1
eval "list$i='a b c'"

So our zip function begins like this

zip(){
    nlists=$#
    for i in $(seq 1 $nlists); do
        eval "list$i='$1'"
        shift
    done
    # the input lists are now in the variable named list1, list2....
    [...]
}

We would like to loop over all lists, extract the first element, append it to the tuple, and update the list to reflect the tail. The least error prone way I have found to extract the first element is to use a printf construct

$ printf "%s\n" a b c d
a
b
c
d

This prints all element of the list line by line. One can easily combine it with head -n1 and tail -n+2 to get the head or the tail of the list.

head=$(printf '%s\n' $list | head -n1)
tail=$(printf '%s\n' $list | tail -n+2)

It works more reliably, for example, than using the buitin set to put the elements of the list in the argument list $1, $2

set $list
head=$1
shift
tail="$*"

Why? The reason is that if the first argument is the string litteral -, set will not behave the way we want! The second part of the function therefore looks like this, with a bunch of evals

while true; do
    tuple=""
    for i in $(seq 1 $nlists); do
        eval "[ -z \"\$list$i\" ] && return 0"
        eval "head=\$(printf '%s\\n' \$list$i | head -n1)"
        eval "list$i=\$(printf '%s\\n' \$list$i | tail -n+2)"
        tuple="$tuple $head"
    done
    echo "$tuple"
done

This looks about right: when one of the lists is empty, we know we will not be able to complete the current tuple, so we return immediately without further echoing. The devil is in the details, because if provided with zero argument, this function will loop forever. However, we really want to start looping if and only if there are arguments. This could be achieved by changing

while true; do

to

while [ $nlists != 0 ]; do

I don't find this construct very readable, as it makes it look like this condition could change during the loops. I'd rather add the check

[ $nlist = 0 ] && return 0

at the beginning of the function.

The result

Here is how the code looks at the end. I think this could be a fun interview question!

zip(){
    nlists=$#
    [ $nlists = 0 ] && return 0
    for i in $(seq 1 $nlists); do
        eval "list$i='$1'"
        shift
    done
    while true; do
        tuple=""
        for i in $(seq 1 $nlists); do
            eval "[ -z \"\$list$i\" ] && return 0"
            eval "head=\$(printf '%s\\n' \$list$i | head -n1)"
            eval "list$i=\$(printf '%s\\n' \$list$i | tail -n+2)"
            tuple="$tuple $head"
        done
        echo "$tuple"
    done
}

Do you think it can be improved? Do you have your own different technique to achieve this? Let me know in the comments!

Comments !