Looping simultaneously over multiple lists in legacy shells

I was recently confronted to the problem of looping over multiple lists simultaneously in a legacy shell, and coming up with an elegant solution was an interesting challenge. Legacy shells do not support arrays, which means there is no structure you can directly address with an index i in order to get the iᵗʰ element. How can one then process multiple lists simultaneously in order to get the first element of each list, then the second element of each list, then the third... and so on?

A python analogy

In python, there is a neat buitin called zip. Essentially, zip takes an arbitrary set of lists as an argument, and returns an iterator of tuples, such that the first tuple is composed of the first element of each list, the second tuple has the second element of each list, and so on. It always goes better with an example:

>>> for t in zip(['a', 'b', 'c', 'd'], [1, 2, 3, 4], ['!', '@', '#', '%']):
...     print(t)
...
('a', 1, '!')
('b', 2, '@')
('c', 3, '#')
('d', 4, '%')

The question is, how can we produce a similar function in a legacy shell?

Let's specify

Our goal is to produce a shell function that we will also conveniently name zip. Of course, this function must work in legacy shells such as ash or dash. It will expect an unspecified number of lists as arguments, and return a flattened list of tuples, where the iᵗʰ tuple is composed of all of the iᵗʰ elements of each list. Like in python, if the lists provided as arguments are not of equal lengths, the result will be cut at the shortest length. Here are examples of expected behaviour:

# 3 lists of same length
$ zip "a b c d" "1 2 3 4" "! @ # %"
 a 1 !
 b 2 @
 c 3 #
 d 4 %

# 1 list
$ zip "a b c d"
 a
 b
 c
 d

# 2 lists of different length
$ zip "a b" "1 2 3"
 a 1
 b 2

# 0 list
$ zip

How to use it would then just be a matter of using the set builtin:

$ list1="1 2 3 4"
$ list2="a b c d"
$ set $(zip "$list1" "$list2")
$ for i in $(seq 1 4); do \
    echo "processing $1 and $2"; \
    shift 2; \
done
processing 1 and a
processing 2 and b
processing 3 and c
processing 4 and d

The implementation

It is often not obvious to the beginner how to create variable names out of other variables. For example, the name list$i is an illegal variable name, so you cannot do:

i=1
list$i="a b c"

What is the way to get around that? The answer is eval, of course!

i=1
eval "list$i='a b c'"

So our zip function begins like this

zip(){
    nlists=$#
    for i in $(seq 1 $nlists); do
        eval "list$i='$1'"
        shift
    done
    # the input lists are now in the variable named list1, list2....
    [...]
}

We would like to loop over all lists, extract the first element, append it to the tuple, and update the list to reflect the tail. The least error prone way I have found to extract the first element is to use a printf construct

$ printf "%s\n" a b c d
a
b
c
d

This prints all element of the list line by line. One can easily combine it with head -n1 and tail -n+2 to get the head or the tail of the list.

head=$(printf '%s\n' $list | head -n1)
tail=$(printf '%s\n' $list | tail -n+2)

It works more reliably, for example, than using the buitin set to put the elements of the list in the argument list $1, $2

set $list
head=$1
shift
tail="$*"

Why? The reason is that if the first argument is the string litteral -, set will not behave the way we want! The second part of the function therefore looks like this, with a bunch of evals

while true; do
    tuple=""
    for i in $(seq 1 $nlists); do
        eval "[ -z \"\$list$i\" ] && return 0"
        eval "head=\$(printf '%s\\n' \$list$i | head -n1)"
        eval "list$i=\$(printf '%s\\n' \$list$i | tail -n+2)"
        tuple="$tuple $head"
    done
    echo "$tuple"
done

This looks about right: when one of the lists is empty, we know we will not be able to complete the current tuple, so we return immediately without further echoing. The devil is in the details, because if provided with zero argument, this function will loop forever. However, we really want to start looping if and only if there are arguments. This could be achieved by changing

while true; do

to

while [ $nlists != 0 ]; do

I don't find this construct very readable, as it makes it look like this condition could change during the loops. I'd rather add the check

[ $nlist = 0 ] && return 0

at the beginning of the function.

The result

Here is how the code looks at the end. I think this could be a fun interview question!

zip(){
    nlists=$#
    [ $nlists = 0 ] && return 0
    for i in $(seq 1 $nlists); do
        eval "list$i='$1'"
        shift
    done
    while true; do
        tuple=""
        for i in $(seq 1 $nlists); do
            eval "[ -z \"\$list$i\" ] && return 0"
            eval "head=\$(printf '%s\\n' \$list$i | head -n1)"
            eval "list$i=\$(printf '%s\\n' \$list$i | tail -n+2)"
            tuple="$tuple $head"
        done
        echo "$tuple"
    done
}

Do you think it can be improved? Do you have your own different technique to achieve this? Let me know in the comments!

To the comments

All posts

  1. SSH over SSL, episode 4: a HAproxy based configuration
  2. Editing a CV in markdown with pandoc
  3. Using openid and the likes to protect static content (lighttpd)
  4. Git on lighttpd
  5. Sigal, a static gallery generator
  6. Jabber notifications on ssh login
  7. Choose your passphrase with a die
  8. Operations Research and Beer drinking
  9. Releasing Michel, a flat-text-file-to-google-tasks uploader
  10. Going static
  11. plowbot, a jabber bot that downloads links from 1-click hosters
  12. SSH over SSL, episode 3: Avoiding using a patched apache.
  13. [Je préfère ton clone] padopi
  14. Using a shell version of supergenpass from vimperator/pentadactyl
  15. Saving your crontab in your dotfiles
  16. Notifications from google calendar on my desktop
  17. SSH over SSL, episode 2: replacing proxytunnel with socat
  18. SSH over SSL, a quick and minimal config.
  19. Vim: complete C++ accurately, pulling informations from the compiler, with gccsense and clang_complete
  20. Google releasing a constraint programming library
  21. Mise à jour de TalkMyPhone
  22. TalkMyPhone, une appli android pour recevoir des notifications de son téléphone
  23. De l'intérêt de détacher des programmes de la console (sans screen)
  24. renaming files and variables from vim
  25. Continuous background compilation within vim
  26. Utilisons incron pour être notifiés des événements du système de fichiers
  27. La TODO liste du pauvre
  28. Gérer ses plugins vim avec :GetLatestVimScripts
  29. gdb 7.0 est sorti, c'est une merveille et vous ne le saviez pas.
  30. autotools, doxygen, et génération conditionnelle
  31. Mettre des couleurs un peu partout (gcc, diff, grep...)
  32. vim+gdb=vimgdb
  33. l'UML automatisé et le libre : c'est pas gagné!
  34. Les lecteurs de flux rss, en ligne, indépendants, libres (suite).
  35. Les lecteurs de flux rss en ligne libres
  36. Couper une vidéo et extraire une scène d'un film
  37. Faire un gif animé à partir d'un film

Atom feed