Lisp lists and destructive functions

Goal of this post is to be some small introduction to lists in Lisp. Since functions and macros for manipulating lists can be easily found by searching the Internet, instead of listing all those helpful functions, this post will focus more on making you prepared for terminology and gotchas surrounding Lisp lists. For beginners to be able to use those functions without falling to common mistakes, it is crucial to know the meaning of phases like “modify a list”, “this list shares structure with that list”, “destructive function”, “literal list”.

All Lisp snippets in this post are in elisp, unless stated otherwise. Nevertheless, everything that is said on this post also applies to Common Lisp, not just elisp, unless stated otherwise. To translate some snippets into Common Lisp, see the Common Lisp note at the end. This post assumes the reader have read the previous article about uniform reference semantics and its implications (for container objects and function parameter passing).

Since this post is part of the Living with Emacs Lisp series, I am mainly speaking to elisp beginners. Nevertheless, if you see something that’s elisp-only that’s not explicitly stated as being elisp-only, feel free to point out.

Credits: Chapter 12 of Practical Common Lisp was of huge help for me to eliminate all the confusion I had on this subject some time ago. Large part of this post draws from that chapter.

1. nil and non-nil

The only falsy value in Lisp is the symbol nil. Also, the value of the variable nil is always nil, in other words, the symbol nil evaluates to itself. In case this is confusing, consider running the following code:

(setq aa (+ 1 1))
(setq bb 'ha)

(print aa)
(print bb)

The value of the variable aa is now the number 2, and the value of the variable bb is now the symbol ha. Since the mapping is really from the symbols aa and bb to other objects as mentioned the previous article, this means that the symbol aa now evaluates to the number 2, and the symbol bb now evaluates to the symbol ha. It is in this context that the symbol nil always evaluates to itself. If you try to change that with this code:

(setq nil (+ 1 1))

you get an error message like this in Emacs:

Attempt to set a constant symbol: nil

or in case of a Common Lisp interpreter:

NIL is a constant and thus can't be set.

Since nil is the only false object and everything else is truthy, you will encounter words like non-nil or nil more often than words like true or false or truthy or falsy in articles about Lisp. For example, the elisp docstring for if says:

if is a special form in `eval.c'.

(if COND THEN ELSE...)

If COND yields non-nil, do THEN, else do ELSE...

and that is another way of saying “If COND is true, do THEN, else do ELSE…”

2. cons cell

A cons cell is a Lisp object with two slots; each slot holds, or refers to1, some Lisp object. One slot is known as the car, and the other is known as the cdr.

The following code creates a cons cell that holds 11 and 22, and then illustrates how to access its car and its cdr.

(setq cc (cons (+ 10 1)
               (+ 20 2)))

(print cc) ; ⇒ (11 . 22)

;; the car of cc is 11
(print (car cc)) ; ⇒ 11

;; the cdr of cc is 22
(print (cdr cc)) ; ⇒ 22

The following code then shows how to modify the cons cell.

;; set the car of cc to 1111
(setf (car cc) 1111)

(print cc) ; ⇒ (1111 . 22)

A cons cell is sometimes just called a cons (plural: conses).

3. cons cells in diagrams

Let’s start with this code:

(setq aa (cons (+ 1 1) (+ 2 2)))
(print aa)

(setq bb aa)

(setq the-car (car aa))
(setq the-cdr (cdr aa))

The first line creates a cons holding two numbers 2 and 4. The second line prints it. It is printed as (2 . 4). The last two lines show how to access the components of the cons. The following ASCII art diagram shows what happens:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.        +-------------+
.        |    car: cdr:|
.        |     |    |  |
.        |     |    |  |
.        +-----|----|--+
.              |    |
.          ^   |    |
.          |   v    v
. bb: -----+               
.              2    4      
.                          
.              ^    ^      
.              |    |      
.              |    |      
.        the-car:  the-cdr:

Now we modify the cons by adding 10 to its car and cdr:

(setf (car aa)
      (+ (car aa) 10))

;; another way
(require 'cl-lib)
(cl-incf (cdr aa) 10)

(print aa) ; ⇒ (12 . 14)
(print (car aa)) ; ⇒ 12
(print the-car) ; ⇒ 2

The aftermath in diagram:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.        +-------------+
.        |    car: cdr:|
.        |     |    |  |
.        |     |    |  |
.        +-----|----|--+
.              v    v
.          ^         
.          |   12   14 
. bb: -----+               
.              2    4      
.                          
.              ^    ^      
.              |    |      
.              |    |      
.        the-car:  the-cdr:

Before we get to more complex diagrams, let’s see how we can simply diagrams of cons cells. Above diagram can be simplified as:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.         +------+-----+
.         |      |     |
.         |    | |  |  |
.         +----|-+--|--+
.              v    v
.          ^         
.          |   12   14 
. bb: -----+               
.              2    4      
.                          
.              ^    ^      
.              |    |      
.              |    |      
.        the-car:  the-cdr:

And now let’s draw numbers next to names for further simplification:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.         +------+-----+
.         |      |     |
.         |  12  |  14 |
.         |      |     |
.         +------+-----+
.                     
.          ^          
.          |           
. bb: -----+               
.                          
.                          
.           2         4    
.        the-car:  the-cdr:

A cons cell can hold any type of object. For example, if we further run the following code, then the cons cell aa holds one number and one other cons cell.

(setf (cdr aa) (cons 0 0))

So the diagram is then:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.         +------+-----+     +-----+------+
.         |      |     |     |     |      |
.         |  12  |   ----->  |  0  |  0   |
.         |      |     |     |     |      |
.         +------+-----+     +-----+------+
.                                      
.          ^                           
.          |                           
. bb: -----+

4. lists

The following code makes a list of three elements and assigns it to the variable aa

(setq aa (list 10 11 12))

The above code is actually equivalent to the following code.

(setq aa (cons 10 (cons 11 (cons 12 nil))))

The elisp reference says “Lists in Lisp are not a primitive data type; they are built up from cons cells”. This just means, to quote from Practical Common Lisp, “The key to understanding lists is to understand that they’re largely an illusion built on top of objects that are instances of a more primitive data type. Those simpler objects are pairs of values called cons cells”. (Vectors are primitive data type on the other hand. This is why I kept using vectors in the previous article.)

A list in Lisp is really a (singly) linked list made of cons cells. When we say aa is a list of three elements 10, 11, 12, this actually means the following diagram:

       +-----+-----+   +-----+-----+   +-----+-----+  
       |     |     |   |     |     |   |     |     |  
aa --> |     |   ----> |     |   ----> |     |   ----> nil
       |  |  |     |   |  |  |     |   |  |  |     |  
       +--|--+-----+   +--|--+-----+   +--|--+-----+  
          |               |               |           
          v               v               v           

          10              11              12

When we say the structure of the list aa or list structure of aa, we mean the three cons cells in the above diagram.

Now run the following code.

(setq aa1 (cdr aa))
(setq aa2 (cdr aa1))

aa1 refers to the second cons cell in the diagram. aa2 is the last cons cell. On the other hand, aa1 and aa2 are lists too: aa1 is a list of two elements, and aa2 is a list of one element. What is a list of zero elements? Run the following code.

(setq aa3 (cdr aa2))

aa3 is nil, but also an empty list. nil is how Lisp represents an empty list.

                           aa1             aa2              

                             |               |         aa3
                             v               v          
                                                        |
       +-----+-----+   +-----+-----+   +-----+-----+    v
       |     |     |   |     |     |   |     |     |        
aa --> |     |   ----> |     |   ----> |     |   ----> nil     
       |  |  |     |   |  |  |     |   |  |  |     |  
       +--|--+-----+   +--|--+-----+   +--|--+-----+  
          |               |               |           
          v               v               v           

          10              11              12
(print aa)  ; ⇒ (10 11 12)
(print aa1) ; ⇒ (11 12)
(print aa2) ; ⇒ (12)
(print aa3) ; ⇒ nil

This is how you access elements of a list:

(car (cdr (cdr aa))) ; ⇒ 12


;; alternative way
(elt aa 2)   ; ⇒ 12

;; all elements
(elt aa 0)   ; ⇒ 10
(elt aa 1)   ; ⇒ 11
(elt aa 2)   ; ⇒ 12

5. improper lists and proper lists

The elisp reference says “By convention, the cdr of the last cons cell in a list is nil. We call such a nil-terminated structure a true list” It sounds like there are some kind of improper lists. There are two kinds of improper lists. The first kind, dotted lists, are like usual lists except that the cdr of their last cons cells are not nil.

;; This is a dotted list.
(setq dotted (cons 0 (cons 1 (cons 2 100))))

;; Let's see how it is printed. (This is why it is called a dotted list)
(print dotted) ; ⇒ (0 1 2 . 100)

dotted is NOT a list of four elements 0, 1, 2, 100. dotted is just an improper list.

The second kind of improper lists are circular lists which are like usual lists, except they don’t have their last cons cells because they are circular in the sense illustrated by the following code.

;; this is a proper list for now
(setq circ (list 0 11 22 33 44))

;; this is its last cons cell
(setq last-cons (last circ))

;; this is some cons cell in the middle
(setq mid-cons (cddr circ))

;; we modify the structure of circ to make it a circular list
(setf (cdr last-cons) mid-cons)

Lisp functions that take a list as its argument usually expect you to pass a proper list, but some functions can take improper lists as well. It is good to be aware of improper lists because docstrings of functions may mention them.

6. check for empty list, non-empty list, list

The function listp is used for checking if the argument is a list or not.

(listp (list 1 2 3))    ; ⇒ t
(listp nil)             ; ⇒ t
(listp 100)             ; ⇒ nil
(listp (vector 1 2 3))  ; ⇒ nil
(listp "hello")         ; ⇒ nil

In fact, listp simply check if the argument is either a cons cell or nil. If it is a cons cell, then one can argue that it is a non-empty list (which may be an improper list), and if it is nil, then it is an empty list.

The function consp is used for checking if the argument is a cons cell, and one can argue that this is the same as checking if it is a non-empty list.

(consp (cons 100 200)) ; ⇒ t
(consp 100)            ; ⇒ nil
(consp (list 1 2 3 4)) ; ⇒ t
(consp nil)            ; ⇒ nil

To check if a variable holds an empty list, you simply check if it is nil.

7. sequence

Sometimes the docstring of a function may say it takes a sequence as one of its arguments. A sequence is a supertype of vector, list, and maybe some other types. So if the docstring of a function says it takes a sequence, you can pass a vector or list to it. Examples of such functions include length, elt, copy-sequence.

(length (list 0 1 2))   ; ⇒ 3
(length (vector 0 1 2)) ; ⇒ 3

(elt (list 0 11 22) 1)    ; ⇒ 11
(elt (vector 0 11 22) 1)  ; ⇒ 11

Also, a string is a sequence.

(length "Hello")  ; ⇒ 5
(elt "Hello" 2)   ; ⇒ 108 (is the character l in elisp)
(copy-sequence "Hello")  ; ⇒ "Hello"

8. literal list

What is a literal object? What is a literal list? For that, consider the following code

(list 0 (+ 10 1) (+ 20 2)) ; ⇒ (0 11 22)

;; This is a literal list.
'(0 11 22)   ; ⇒ (0 11 22)

(equal (list 0 (+ 10 1) (+ 20 2))
       '(0 11 22))
;; ⇒ t


;; A literal list should be quoted (with a single quote) in general in code.
(length '(0 11 22))   ; ⇒ 3

;; If you don't quote it, this happens.
(length (0 11 22))    ; ERROR: 0 is not a function.
;; Expecting the above line to evaluate to 3 is like
;; expecting the below line to evaluate to 11.
(length Hello world)  ; ERROR

(length "Hello world") ; ⇒ 11


;; This is a literal cons cell.
'(11 . 22)  ; ⇒ (11 . 22)

(equal (cons 11 22)
       '(11 . 22))
;; ⇒ t

;; These are quoted forms. Quoted forms give you literal objects.
'(11 . 22)
'(0 11 22)
'((0 1) (2 3) (4 5))

;; two ways of writing nested lists
(equal (list (list 0 1) (list 2 3) (list 4 5))
       '((0 1) (2 3) (4 5)))
;; ⇒ t

There are two kinds of literal objects. The first kind comes from writing quoted forms. The second kind comes from writing self-evaluating forms. What are self-evaluating forms?

;; These are self-evaluating forms. They evaluate to themselves.
"Hello world"  ; ⇒ "Hello world"
1234           ; ⇒ 1234


(setq aa 1234)

;; This is NOT a self-evaluating form. This is just a mod form.
(mod 25 10) ; ⇒ 5

;; This is NOT a self-evaluating form. This is just the variable aa.
aa        ; ⇒ 1234

The rule “One should not modify a literal object” means that you should not change any part of a literal object.

;; Assign a literal object to the variable aa.
(setq aa '((0 11) (22 33) (44 55)))

;; Let bb refer to the last element of aa.
(setq bb (elt aa 2)) ; ⇒ (44 55)

;; See what the first element of bb is.
(elt bb 0)  ; ⇒ 44

;; This modifies our literal object. This is NOT ok.
(setf (car bb) 4400)

;; This also modifies our literal object. This is NOT ok.
(setf (cdr bb) nil)

;; the aftermath
aa ⇒ ((0 11) (22 33) (4400))

Some code for illustration

(setq aa '(0 11 22))
(setq bb (list 0 11 22))

;; This modifies a literal object. This is NOT ok.
(setf (elt aa 2) 2200)

;; This is ok.
(setf (elt bb 2) 2200)


(setq aa "foo")
(setq bb (copy-sequence "foo"))

;; This modifies a literal string. This is NOT ok.
(setf (elt aa 0) (elt "m" 0))

;; This is ok.
(setf (elt bb 0) (elt "m" 0))
bb ; ⇒ "moo"

More detail on literal objects will be the subject of another post which I will link as soon I write it, for now it’s enough to know two facts:

  • You should not modify literal data.
  • A literal list is also referred to as “constant list” or “quoted list” (for obvious reasons).

9. shared structure

In the following diagram, we say that the list x and the list y share structure, because they share some tail. Recall that the list structure of x is cons cells making up the list x. The list structure of x and that of y share two cons cells, and that is why we say that the two lists share structure.

      +-----+-----+   +-----+-----+
      |     |     |   |     |     |
x --> |     |   ----> |     |     |  
      |     |     |   |     |  |  |  
      +-----+-----+   +-----+--|--+  
                               |     +-----+-----+   +-----+-----+
                               +---> |     |     |   |     |     |
                                     |     |   ----> |     | nil |
                               +---> |     |     |   |     |     |
                               |     +-----+-----+   +-----+-----+ 
      +-----+-----+   +-----+--|--+                                
      |     |     |   |     |  |  |                                
y --> |     |   ----> |     |     |                                
      |     |     |   |     |     |                                
      +-----+-----+   +-----+-----+

If we modify the list x by doing (setf (elt x 2) something), we change the element of x at index 2 (in the sense of new designation), that is, we change the car of the last-to-second cons cell. And that affects the list y too because the last-to-second cons cells for the list x and that for the list y are the same cons cell.

If we modify the list x by doing2 (setf (cdr (cddr x)) something), we change the cdr of the last-to-second cons cell, and depending on what something is, this can make x a longer list or a shorter list or an improper list. And that affects the list y too.

So doing something to the list x can affect the list y. This is why we sometimes need to watch out for lists that share structure with other lists, and there are many functions that return a list that may share structure with one of its arguments. Notice that (eq x y) being false (i.e. is nil) does NOT guarantee that the two list do not share structure. In the appendix is a function for checking if two lists share structure.

In the following diagram, the list z and the list w do NOT share structure, but they DO share elements. This will remind you of the shallow copy diagram in the previous article. We can say that the list z is a shallow copy of the list w. Often we just say that z is a copy of the list w.

      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
      |     |     |   |     |     |   |     |     |   |     |     |
z --> |     |   ----> |     |   ----> |     |   ----> |     | nil |
      |  |  |     |   |  |  |     |   |  |  |     |   |  |  |     |
      +--|--+-----+   +--|--+-----+   +--|--+-----+   +--|--+-----+
         |               |               |               |         
         v               v               v               v         

         10              11              12              13        

         ^               ^               ^               ^         
         |               |               |               |         
      +--|--+-----+   +--|--+-----+   +--|--+-----+   +--|--+-----+
      |  |  |     |   |  |  |     |   |  |  |     |   |  |  |     |
w --> |     |   ----> |     |   ----> |     |   ----> |     | nil |
      |     |     |   |     |     |   |     |     |   |     |     |
      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+

Doing (setf (elt z 2) something) or (setf (cdr (cddr z)) something) affects only z and not w. This is unlike the case with x and y. Nevertheless, the two lists share elements, so if some of the elements were mutable objects, and if you mutate one of those mutable elements, then the change would been seen via both z and w. But you don’t have to worry about that often, because you usually expect and want that change to be seen both lists.

The following code first makes two lists that share structure in the way that matches our diagram for x and y, and then modifies the first list to see what happens.

(setq tail (list 22 33))

(setq x (cons 0 (cons 1 tail)))
(setq y (cons 2 (cons 3 tail)))

x  ; ⇒ (0 1 22 33)
y  ; ⇒ (2 3 22 33)

(setf (elt x 2) 2200)

x  ; ⇒ (0 1 2200 33)
y  ; ⇒ (2 3 2200 33)


(setf (cdr (cddr x)) nil)

x  ; ⇒ (0 1 2200)
y  ; ⇒ (2 3 2200)

The following code first makes the list z and w in the way that matches our diagram for z and w, and then modifies the first list to see what happens

(setq z (list 10 11 12 13))
(setq w (copy-sequence z))

z  ; ⇒ (10 11 12 13)
w  ; ⇒ (10 11 12 13)

(setf (elt z 2) 1200)

z  ; ⇒ (10 11 1200 13)
w  ; ⇒ (10 11 12 13)

(setf (cdr (cddr z)) nil)

z  ; ⇒ (10 11 1200)
w  ; ⇒ (10 11 12 13)

Since vectors are primitive data type unlike lists, a vector cannot share structure only partially with another vector. Two vectors are either the same object or completely distinct: there is no in-between. The need to watch out for shared structure between lists is a consequence of lists being an illusion. Another consequence is the need to assign back to the variable holding the list sometimes, which is the subject of the next two sections.

In the appendix is some tips for figuring out the correct diagram given a snippet where you are not sure which list share structure with which.

10. what it means to modify a list

The previous section gave examples of modifying a list and its effects on another list that happens to share structure with the former list. Here is an example of NOT modifying a list.

(setq aa (list 0 11 22))

aa  ; ⇒ (0 11 22)

;; Insert 10000 to aa.
(push 10000 aa)

aa  ; ⇒ (10000 0 11 22)

;; Insert 20000 to aa.
(push 20000 aa)

aa  ; ⇒ (20000 10000 0 11 22)

;; Pop out the first element from aa.
(pop aa) ; ⇒ 20000

aa  ; ⇒ (10000 0 11 22)

push and pop are treating aa like a stack structure. Nevertheless, no list is being modified in the above code. In fact, what is happening is analogous to the use of incf in the previous article. push and pop are macros and not functions, and the code (push 10000 aa) just expands to the code (setq aa (cons 10000 aa)), and the code (pop aa) expands to something like:

(let ((val (car aa)))
  (setq aa (cdr aa))
  val)

In the same sense that incf does NOT modify any numbers (which are immutable anyway), the following code does NOT modify any list.

;; Let's start with a literal list this time.
(setq aa '(0 11 22))

aa  ; ⇒ (0 11 22)

(setq aa (cons 10000 aa))

aa  ; ⇒ (10000 0 11 22)

(setq aa (cons 20000 aa))

aa  ; ⇒ (20000 10000 0 11 22)

(setq aa (cdr aa))

aa  ; ⇒ (10000 0 11 22)

Nevertheless, we say that we are adding 10000 to the list stored in aa by running (setq aa (cons 10000 aa)). Here is the docstring of push:

push is a Lisp macro in `subr.el'.

(push NEWELT PLACE)

Add NEWELT to the list stored in the generalized variable PLACE.
This is morally equivalent to (setf PLACE (cons NEWELT PLACE)),

Notice the phrases “generalized variable”, “place” being used in the docstring. Use of such phrases implies that the push macro, just like incf and setf, can be used with not just variables (like aa) but also with places like so:

(setq vv (vector (list 0 0 0)
                 (list 1 1 1)))

;; Push 9 into the PLACE (elt vv 0).
(push 9 (elt vv 0))

vv   ; ⇒ [(9 0 0 0) (1 1 1)]

The code (push 9 (elt vv 0)) just expands to an appropriate code. I mention this because this is why a push form CAN expand to some code that modifies a list, and the same is true with pop. For example, the following code shows that a pop form can expand to some code that modifies a list.

(setq aa (list 0 11 22))

(pop (cdr aa))  ; ⇒ 11

aa   ; ⇒ (0 22)

(pop (cdr aa)) expands to some code that mutates the first cons cell in an appropriate way. This is why push and pop are convenient: that pop form just magically expands to the right code and you don’t have to write the right code yourself. On the other hand, this also means that just because you are using pop and push does not mean that no list is being modified.

(setq aa (list 0 11 22))
(setq bb aa)

;; The following line will NOT affect bb, as it does not modify any list.
(push 10000 aa)

aa  ; ⇒ (10000 0 11 22)
bb  ; ⇒ (0 11 22)

;; Remember that the list aa and the list bb still share structure.
;; That is why the following line WILL affect bb.
(setf (elt aa 2) 9)

aa  ; ⇒ (10000 0 9 22)
bb  ; ⇒ (0 9 22)

Now you how to add an element to a list using push and what that actually means. Just in case you are still not sure what’s going on, here is the diagram representing the state right BEFORE running the line (push 10000 aa) in the above code:

                      +-----+-----+   +-----+-----+   +-----+-----+
                      |     |     |   |     |     |   |     |     |
aa  -------------->   |  0  |   ----> | 11  |   ----> | 22  | nil |
                      |     |     |   |     |     |   |     |     |
                      +-----+-----+   +-----+-----+   +-----+-----+

                            ^
                            |
bb -------------------------+

And here is the diagram for the state right after running that line:

      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
      |     |     |   |     |     |   |     |     |   |     |     |
aa -> |10000|   ----> |  0  |   ----> | 11  |   ----> | 22  | nil |
      |     |     |   |     |     |   |     |     |   |     |     |
      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+

                            ^
                            |
bb -------------------------+

Emacs users might raise a question: what is the difference between add-to-list and push and how can the function (not a macro) add-to-list do what it does? That will be the subject of another post which I will link as soon as I write it.

11. destructive functions and non-destructive functions

In this section, I will explain the notion of destructive functions, how to use them and when to use them.

Many functions for manipulating lists come in pairs of destructive function and non-destructive function. For example, the functions reverse and nreverse are such a pair, and they are both for reversing a list. reverse is the non-destructive one and nreverse is the destructive one. The following table shows many of such pairs. Functions in the last two rows don’t come in pairs but I included them in the table as a note that those functions are destructive.

non-destructive destructive what the pair is for
reverse nreverse reverse a list
append nconc concatenate two lists or more
remove delete remove elements from a list
cl-substitute cl-nsubstitute replace elements in a list
N/A sort sort a list
N/A cl-merge merge two lists

I believe the pair remove and delete is the most general example, hence the best example to illustrate my points, so let’s start with that pair.

Example use of remove (obtaining a list with the number 5 removed):

(setq arg (list 5 1 2 5))
arg  ; ⇒ (5 1 2 5)
(setq result (remove 5 arg))
result ; ⇒ (1 2)
arg    ; ⇒ (5 1 2 5)

Example use of delete:

(setq arg (list 5 1 2 5))
arg  ; ⇒ (5 1 2 5)
(setq result (delete 5 arg))
result ; ⇒ (1 2)

That is how these two functions are supposed to be used. None of the two functions make it so that the variable arg refers to the result you want. Instead, the return value is the result. Let’s call it Property P1.

(Property P1) Its return value is the result list.

Property P1 is shared by both functions. The following property on the other hand is only guaranteed for the non-destructive function remove and not for the destructive function delete:

(Property P2) The original list that is passed to the function (as its argument) is NOT modified.

That is why remove is called a non-destructive function: it preserves the list structure. No list gets modified when you use remove.

The function delete on the other hand may modify the original list. If you use this function, any other list that happens to share structure with the original list may get modified as well. Hence the term “destructive”. Sometimes we say “the function delete is allowed to destroy the structure of the original list.” and that just means that it may modify the list, and the reason we can say the strong word “destroy” instead of “modify” in that sentence is that the original list as we knew it before is no more.

So the destructive function delete is a dangerous function in some sense, but it’s more efficient than the non-destructive function remove. In general, it’s best to use remove in your code, and if profiling shows you need to optimize, that is when you replace the use of remove with the destructive delete but only if you are sure that the argument list isn’t referenced from anywhere else.

Finally, there is something to watch out for even in the case of the non-destructive remove: the following property is NOT guaranteed for remove or delete.

(Property P3) The result list (the return value) does NOT share structure with the list passed to the function.

In other words, the return value of remove may share structure with its argument.

We’ve mentioned that the following property does NOT hold for remove or delete.

(Property P4) It makes arg become the result.

A common mistake is to assume that P4 holds for the destructive delete or to assume that P3 holds for the non-destructive remove.

Now that delete and remove are explained, I’d add that there are also functions cl-remove-if, cl-remove-if-not, cl-delete-if, cl-delete-if-not.

Here are some basic things to know about the functions mentioned in our table:

  • Every function in the table has Property P1, regardless of whether it is destructive or not.
  • Every function in the first column has Property P2. None of the functions in the second column has that property. A common mistake is to assume that the function sort is non-destructive and has Property P2.
  • Most functions in the table (even the ones in the first column) do NOT guarantee Property P3. The only exception I know is reverse.
  • It is impossible to write a version of nconc, delete, or cl-merge in a way that guarantees P4 (for the same reason that incf cannot be a function.)
  • For most functions in the second column, what happens to the argument is not specified, so you shouldn’t rely on its effects on the argument. (Practical Common Lisp points out two exceptions: nconc and nsubstitute)

Some care need to be taken when you use an operation that result in a list sharing structure and another operation that is destructive. For example

(setq aa (list 5 3 1))
(setq bb (list 4 2 0))

;; Concatenate.
(setq cc (append aa bb))

;; Sort cc.
(setq cc (sort cc '<))

cc  ; ⇒ (0 1 2 3 4 5)

;; Let's see if there was any nasty side effect.
aa  ; ⇒ (5 3 1)
bb  ; ⇒ (4 5)  <----- YMMV

To see why bb got destroyed, recall:

  • The function append is on our table. Its return value may share structure with some of its arguments (in particular, its last argument).3
  • The function sort is on the second column. It is destructive.

If you don’t want bb to be destroyed, you should use sort with copy-sequence like this:

(setq cc (sort (copy-sequence cc) '<))

Think of the idiom (sort (copy-sequence ...) ...) as the non-destructive counterpart to sort. For example, if you want to print all paths in the elisp global variable load-path in a sorted way into the messages buffer, you run:

(dolist (path (sort (copy-sequence load-path) 'string<))
  (message "%s" path))

But if you use sort directly without copy-sequence, the load-path will get messed up.

Some popular examples of safe use of destructive functions in the second column is included in the appendix.

Actually the phrase “this operator is destructive” can have broader meaning than this post touched upon. In fact, from the Common Lisp HyperSpect glossary:
“destructive adj. (of an operator) capable of modifying some program-visible aspect of one or more objects that are either explicit arguments to the operator or that can be obtained directly or indirectly from the global environment by the operator. ”

So setq is a destructive operator in that sense. For this reason, functions on the second column are sometimes called recycling functions (because they are allowed to recycle cons cells of the given list).4

12. Do this post apply to alists and trees as well?

There are other useful illusion data types other than lists. Two of them are:

  • alist (association list)
  • tree

And they are also made of cons cells.

Here is an example of an alist. It is just a list of cons cells, to be interpreted as storing information about which names are male names and which names are not (0 indicates female name, and 1 indicates male name.).

(setq my-alist (list (cons "Alice" 0)
                     (cons "Bob" 1)
                     (cons "Dan" 1)))

my-alist ; ⇒ (("Alice" . 0) ("Bob" . 1) ("Dan" . 1))

When we say the alist structure of my-alist, we mean the six cons cells making up that alist. But when we say the list structure of my-list, we mean the three cons cells making up the list structure. So when we say the alist my-alist share structure with some other alist, we mean that some of the six cons cells are shared with the other alist. But when we say that the list my-alist share structure with some other list, we mean the list structure (made of three cons cells) being shared.

Let’s see the docstring of copy-alist

(copy-alist ALIST)

Return a copy of ALIST.
This is an alist which represents the same mapping from objects to objects,
but does not share the alist structure with ALIST.
The objects mapped (cars and cdrs of elements of the alist)
are shared, however.

So (copy-alist my-alist) will return an alist composed of six new cons cells, while (copy-sequence my-alist) or (cl-copy-list my-alist) will produce only three new cons cells, while the other three cons cells are shared because they are considered as elements of my-alist as as a list. Elements of my-alist as a list are the three cons cells being used for associations, but elements of my-alist as an alist are the three strings and the associated numbers. So “copy of an alist” and “copy of a list” end up meaning different things.

We’ve only covered functions for manipulating lists, but there are also functions for manipulating alists, and some of them come in pairs. Now that you know the meaning of “alist structure”, “modifying an alist”, “alist structure being shared”, you will be able to use those functions safely as well.

Finally, when we say “tree structure”, we mean all cons cells reachable from following car or cdr. So when you have a list of two elements and if the first element is another list with three numbers, and the second element is yet another list with ten strings, then you have a nested list, and the tree structure consists of 2 + 3 + 10 cons cells. You will be able to guess what the function copy-tree does.

13. Common Lisp note

To translate elisp snippets in this post into Common Lisp snippets, change some function names: cl-foo into foo, and copy-sequence into copy-seq. You might also want to wrap the snipets with appropriate let forms (so that aa and bb and so on are local variables), or use defparameter and the earmuff convention.

14. Appendix

One example. This example from Practical Common Lisp uses nreverse in a safe way.

(defun upto (max)
  (let ((result nil))
    (dotimes (i max)
      (push i result))
    (nreverse result)))

(upto 10)   ; ==> (0 1 2 3 4 5 6 7 8 9)

Here is a function for testing shared structure between lists.

(defun my-list-share-structure-p (list1 list2)
  "Return non-nil if the two lists share structure, otherwise nil. Assume non-cyclic."
  (and (consp list1)
       (consp list2)
       (eq (last list1)
           (last list2))))

Test:

(let ((lst (list (list 3 1) (list 4 1) (list 5 9))))
  (list (my-list-share-structure-p lst lst)
        (my-list-share-structure-p lst (cdr lst))
        (my-list-share-structure-p lst (copy-sequence lst))
        (my-list-share-structure-p lst nil)
        (my-list-share-structure-p lst (cons 99 lst))
        (my-list-share-structure-p lst (append (list 99 99) lst))))

Given a snippet, if you are not sure what diagram it should produce, you can use eq to test whether two given cons cells are the same cons cells. If you want to see the diagram at once, use the variable print-circle (for Common Lisp it’s *PRINT-CIRCLE*):

(setq aa (list 0 11 22))
(setq bb (list 1 2))
(setq cc (append aa bb))

(let ((print-circle t)
      (print-level nil)
      (print-length nil))
  (print (vector aa bb cc)))

It would be good to have something like Online Python Tutor – Visualize program execution for Lisp.

Footnotes:

1

Recall from the previous article that container objects refer to other objects

2

Yes this is equivalent to (setf (cdr (cdr (cdr x))) something) and also to (setf (cl-cdddr x) something)

3

As long as the last argument is not nil, sharing structure with the last argument is guaranteed. And only the last argument.

4

Started by Practical Common Lisp

This entry was posted in Emacs, Lisp and tagged , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s