Putting a bar or a tilde over a letter in LaTeX

As you are aware, there are commands to put a bar or a tilde over a symbol in math mode in LaTeX. Sometimes, the output doesn’t come out the way some of us might expect or want.

Fortunately, there are alternative commands that do the same task differently that we can try and there are also other ways of using the same commands.

To put a tilde over a letter, we can use either \tilde or \widetilde. As for which one to use in which situation, compiling a document with the following as its part can help comparison.

$\tilde{A}$ vs $\widetilde{A}$

$\tilde{\mathcal A}$ vs $\widetilde{\mathcal A}$

$\tilde{ABC}$ vs $\widetilde{ABC}$

To put a bar over a letter, we can use either \bar or \overline. It seems that \bar is to \overline what \tilde is to \widetilde. There don’t seem to be \overtilde or \widebar.

$\bar{A}$ vs $\overline{A}$

$\bar{\mathcal A}$ vs $\overline{\mathcal A}$

$\bar{ABC}$ vs $\overline{ABC}$

Over a symbol with a subscript or superscript index, one can imagine four ways of putting a bar or tilde over it. Among the four ways, one can say that one or two of them looks right but that may depend. I can’t comment on which ones look good. You just have to try each and decide with your colleagues.

$\tilde{A_2}$ vs $\widetilde{A_2}$ vs $\tilde{A}_2$ vs $\widetilde{A}_2$

$\bar{A_2}$ vs $\overline{A_2}$ vs $\bar{A}_2$ vs $\overline{A}_2$
Posted in Uncategorized | Tagged | Leave a comment

why learn elisp (Emacs Lisp)

This post is the intro module of Living with Emacs Lisp.

The goal of this post is to hopefully convince you why you should learn elisp. Readers are assumed to be beginning users of Emacs. (Of course he who does not use Emacs does not need to learn elisp)

So you are a beginning user of Emacs and let me raise a relevant question: suppose someone asked you “Why did you choose Emacs?” The longer question would be “What is the point of using Emacs in particular when there are other great text editors out there that are also cross-platform, customizable and extensible?”

You probably have your answer to the question. If you want to hear my answer, I’d say “I choose Emacs because Emacs is customizable and extensible to an extreme degree. I believe that is the defining character of Emacs. That may not mean that everybody should use Emacs but that is certainly why I will continue to use Emacs.”

I think my customization of Emacs is only to a mild degree and I may never actually customize it to an extreme degree. However, because I know that Emacs allows extreme customization, I can be sure that Emacs allows any kind of mild customization I might want later. It’s like how some columnist would feel sure that organizations like ACLU will come defend his right to say his mildly controversial speeches if needed after he has seen ACLU defend those with extreme opinions.

The extent to which Emacs can be extended to suit your needs. You can do a lot with Emacs without learning elisp: you can define keyboard macros, you can install and use Emacs packages for your needs, you can use the Customize facility to customize your Emacs and packages to some extent. That is, without learning elisp, you can already experience Emacs to be one of those editors that are cross-platform, customizable and extensible. As you learn elisp and as you start to use hooks in interesting ways, or make your own small minor modes, or even make your own Emacs package, you start to experience Emacs to be an editor that is customizable and extensible to an extreme level. Think of that as a kind of premium feature of Emacs which you unlock by paying, and you pay not with dollars, but with time: time devoted to learning elisp.

On the other hand, there is an old saying: “Time is money”. Of course we can’t and shouldn’t devote all day to learning elisp. And we wouldn’t expect that one hour of learning elisp would unlock all the power. It’s a gradual thing. A learning curve is there, but not as steep as you might expect. Do you have something in mind that you always wanted to have in your text editor? Your mission, should you choose to accept it, is to learn enough elisp to make that something happen.

Posted in Emacs | Tagged | Leave a comment

Type backslash easily in Emacs AUCTeX

Writing LaTeX documents seem to require many presses of backslashes and the backslash key is located at awkward place and hard to type in some types of keyboards. For users of Emacs AUCTeX, there are some ways to ease this pain.

1. type slash to type backslash

With this elisp code added to your init file, whenever you type slash in a LaTeX buffer, backslash is inserted instead. The slash key is easier to reach.

(defun my-make-slash-backslash ()
  (local-set-key (kbd "/") "\\"))

(add-hook 'TeX-mode-hook 'my-make-slash-backslash)

When you have to enter slash itself, which is not often, use C-q. When you have to enter many slashes, for example you could be typing a url, in this case I believe copy and paste is the simplest workaround. It is possible to extend the code so that typing backslash results in insertion of slash as well, so that is another workaround, but I find that disorienting, so I don’t use that workaround, but you may want to use that.

If you want to be able to turn on and off “slash turns into backslash” on the fly, you can write a minor mode for that.

2. use the two commands

Another way to enter backslashes with ease is to use C-c C-m to enter a TeX macro (things like \section{...} and \frac{..}{..}) and C-c C-e to enter a LaTeX environment (things like \begin{..} .. \end{..}) in LaTeX buffers.

Posted in Emacs | Tagged | Leave a comment

Doob-Dynkin Lemma for probability space

Let f, g be measurable functions from a probability space (\Omega, \mathcal F, \mathbb P) to measurable spaces (X, \mathcal A) and (Y, \mathcal B) respectively. Consider the following three conditions:
(1) \sigma(f) \supset \sigma(g) \bmod \mathbb P (in other words, each element in \sigma(g) is equivalent (up to null sets) to some element in \sigma(f). The notation \bmod \mathbb P is used when an author wants to make explicit the practice of treating sub-sigma algebras ignoring null sets.)
(2) There is a measurable H: (X, \mathcal A) \to (Y, \mathcal B) with g = H \circ f a.e.
(3) There is a measure preserving map H: (X, \mathcal A, \mu) \to (Y, \mathcal B, \nu) with g = H \circ f a.e. where \mu, \nu are the pushforward measures obtained from \mathbb P.

(2) implies (3) trivially. (3) implies (2) trivially (even if the measure preserving map is only a.e. defined.)

(2) implies (1) trivially. The Doob-Dynkin lemma for probability space is that (1) implies (2) if (Y, \mathcal B) is a standard Borel space. In this post, we present some proofs of this lemma and some corollaries and others. Compare with the Doob-Dynkin lemma for measurable spaces (the previous article). As we’ll see, Doob-Dynkin lemma for probability space is a special case of that for measurable spaces. The one for probability spaces suffices for most applications in probability theory and ergodic theory.

1. Proof 1

In this proof, we simply show that the Doob-Dynkin lemma for probability space follows from that for measurable spaces. It is enough to show that the condition \sigma(f) \supset \sigma(g) \bmod \mathbb P can be improved to \sigma(f) \supset \sigma(g) after discarding all points in some null set from \Omega.

For each B \in \mathcal B, there is A_B \in \mathcal A such that the symmetric difference E_B := g^{-1}B \cap f^{-1}A_B is a \mathbb P-null set. It would be nice to be able to discard all points in the union \bigcup_{B \in \mathcal B} E_B but is this a null set? Perhaps not.

Exercise 10. Show that it is enough to discard all points in the countable union \bigcup_{n} E_{B_n} where (B_n)_n is a sequence that generates \mathcal B (Existence of such a sequence is guaranteed by the assumption that (Y, \mathcal B) is a standard Borel space).

2. Proof 2

In this proof, we adapt an argument in the previous article to our setting of ignoring null sets. We divide into two cases.

Case I: when the image of g countable

In this case, there is A_i \in \mathcal A such that g^{-1}y_i = f^{-1}A_i a.e. for each i.

Notice that f^{-1}A_i form a countable partition modulo \mathbb P, in other words, the intersection of f^{-1}A_i and f^{-1}A_j is a null set when i \ne j and their union over all i has measure 1, or equivalently that for a.e. \omega, there exists unique i for which \omega \in f^{-1}A_i.

Exercise 20. Show that A_i form a countable partition mod \mu. (the general idea is that the measure algebra homomorphism A \mapsto f^{-1}A behaves like an inclusion map)

Let H be any (everywhere-defined) measurable function from X to Y such that for \mu-a.e. x and for each i we have x \in A_i \implies H(x) = y_i. It is easy to define such an H.

Now it only remains to show g = H \circ f a.e. For a.e. \omega \in \Omega, we have:
there is i for which f(\omega) is in A_i;
for that i, H(f(\omega)) = y_i (by property of H);
but that y_i = g(\omega) (by how A_i chosen);
so H\circ f = g on this \omega.

Case II: when the image of g is not countable.

WLOG assume (Y, \mathcal B) is the unit interval.

Let g_n := \lfloor 2^n g \rfloor / 2^n. There is a measurable H_n: (X, \mathcal A) \to (Y, \mathcal B) with g_n = H_n \circ f a.e..

Notice that g_n \nearrow g holds everywhere. It would be nice to take H = \lim H_n but we only know H_n converges at least on f(\Omega_0) for some subset \Omega_0 of measure 1, and we don’t know if this image is measurable.

A work around is that we let H be any (everywhere-defined) measurable such that H(x) = \lim H_n(x) for all x for which the limit exists. Such H exists and with this H we can proceed to show g = H \circ f a.e..

Another work around is to notice that the subset [H_n \nearrow] (alternatively also [\exists \lim H_n]) is measurable and so one can show that its measure is 1. Then we set H to be a \mu-a.e. limit of H_n. Then H \circ f is a P-a.e. limit of H_n \circ f. So we can proceed with this H too.

3. Proof 3

In this proof, we show that (1) implies (3) using the fact that (Y, \mathcal B, \nu) is a Lebesgue space (which follows because (Y, \mathcal B) is a standard Borel space) and the fact that a Lebesgue space can be approximated by a sequence of partitions. This proof is probably equivalent to the previous one, but let’s do it for illustration.

The idea in this proof is that points in X (resp. Y) more or less should correspond to appropriate nested sequence of elements in \mathcal A (resp. \mathcal B), so in order to build H, we only need establish an appropriate procedure to transform an appropriate nested sequence of elements in \mathcal A to that of \mathcal B, but that is done for free by observing that with abuse of notation we may safely pretend that \mathcal B \subset \mathcal A \subset \mathcal F \bmod \mathbb P (if we identify \sigma(f), \sigma(g) with \mathcal A, \mathcal B). We’ll proceed without abuse of notation.

Notice that WLOG we may replace (X, \mathcal A, \mu) and (Y, \mathcal B, \nu) with any other almost-isomorphic probability spaces at the minor cost of losing everywhere definedness of f, g.

WLOG there is a non-decreasing sequence of countable partitions \beta_n \subset B such that for any non-increasing choice B_n \in \beta_n the intersection \cap_n B_n is a single point and that (\beta_n)_n generates \mathcal B. This is possible because we may assume that the probability space Y is a disjoint union of the Cantor space and atoms, which follows from the result that any Lebesgue space is almost-isomorphic to a disjoint union of an interval and atoms).

Now we want to pick a good sequence \alpha_n such that for all n we have g^{-1}(\beta_n) = f^{-1}(\alpha_n) \bmod \mathbb P. We can assume that for each n all elements of \beta_n (and hence also those of g^{-1}(\beta_n) too) have positive measure. So we can ensure that all elements of \alpha_n have positive measure. For each n, \alpha_n is a countable partition modulo \mu and the sequence (\alpha_n)_n is non-decreasing modulo \mu. After discarding only countably many \mu-null sets from X, we can ensure that \alpha_n form a non-decreasing sequence of countable partitions. The reason we ensure elements of \alpha_n and \beta_n only have elements of positive measure is because we want to ensure the natural bijection between \alpha_n and \beta_n.

For each x \in X, let \alpha_n(x) be the unique element in \alpha_n containing x, and let \beta_n(x) be the unique element in \beta_n which corresponds to \alpha_n(x) in the sense that g^{-1}(\beta_n(x)) = f^{-1}(\alpha_n(x)) \mod \mathbb P. Define H by \{H(x)\} = \cap_n \beta_n(x) which is a well defined map from X to Y (which is because \beta_n(x) is non-decreasing (for a fixed x) because its counterpart in \Omega is non-decreasing up to null sets because its counterpart in Y is.).

(Measurability of H) For each B_n \in \beta_n, it is easy to show that H^{-1}(B_n) is measurable, moreover, equal to A_n (where A_n \in \alpha_n is the element corresponding to B_n).

(On g = H \circ f a.e.) Notice that H is measurable and everywhere defined, so g' := H\circ f is an almost-everywhere defined measurable function. So we are comparing two a.e. defined measurable functions from (\Omega, \mathcal F, \mathbb P) to (Y, \mathcal B). Inverse images of B_n \in \beta_n under the two functions are equal up to null sets because of H^{-1}(B_n) = A_n. Now notice that the set [g \neq g'] is a subset of \bigcup_{n, B, B': B, B' \in \beta_n, B \neq B'} (g^{-1}B \cap g'^{-1}B') where the terms in the latter are null sets because of the previous sentence. We have shown g = g' a.e..

(On measure preserving) It is easy to show that H is measure preserving from the fact that g = g' a.e.. Alternatively, it also follows from H^{-1}(B_n) = A_n.

4. Uniqueness of H

If H and H’ satisfy (2) , then H = H' holds \mu-a.e.. This follows from the following two observations:

  • H \circ f = H' \circ f holds \mathbb P-a.e.
  • [H = H'] is measurable (because (Y, \mathcal B) is a standard Borel space).

5. applications

Exercise 25. Let f, g be measurable functions from a probability space (\Omega, \mathcal F, \mathbb P) to measurable spaces (X, \mathcal A) and (Y, \mathcal B) respectively. Show that \sigma(f) = \sigma(g) \bmod \mathbb P if and only if there are X_0 \in \mathcal A with \mu(X_0) = 1 and Y_0 \in \mathcal B with \nu(Y_0) = 1 and a bi-measurable bijection H: X_0 \to Y_0 such that g = H \circ f a.e..

Exercise 30. Given two measure-theoretic dynamical systems (X, \mathcal A, \mu, T) and (Y, \mathcal B, \nu, S) (measure preserving transformations on Lebesgue spaces), show that the latter is a factor of the former if and only if there is a joining \lambda (a T\times S-invariant probability measure on (X \times Y)) such that \mathcal B \subset \mathcal A mod \lambda (with abuse of notation).

Posted in Mathematics | Tagged , | Leave a comment

Proofs of Doob-Dynkin Lemma

Let f, g be functions from a non-empty set \Omega to measurable spaces (X, \mathcal A) and (Y, \mathcal B) respectively. Consider the following two conditions:
(1) \sigma(f) \supset \sigma(g)
(2) There is a measurable function H: (X, \mathcal A) \to (Y, \mathcal B) with g = H \circ f

By \sigma(f), I mean the sigma-algebra generated by f, so \sigma(f) = f^{-1}(\mathcal A).

The Doob-Dynkin Lemma is that the two conditions (1) and (2) are equivalent under the assumption that the measurable spaces are reasonable ones. The intuition for the lemma is this: when \mathcal C and \mathcal D are two sigma-algebras on a common underlying set such that \mathcal C \supset \mathcal D, then intuitively it means that the information specified by \mathcal D is contained in the information specified by \mathcal C. So the condition (1) should mean that the information as to what is the value of f(\omega) (given some unknown point \omega \in \Omega) contains the information as to what is the value of g(\omega), in other words, f(\omega) determines g(\omega), that is, g is a function of f. The condition (2) says g is a measurable function of f.

Applications and corollaries of this lemma include

  • characterization of the property of a function being measurable with respect to a sub-sigma-algebra of a given sigma-algebra
  • writing a conditional expectation of a random variable given another random variable as a Borel-measurable function of the latter random variable
  • characterization of the property of two measure theoretical dynamical systems being isomorphic by existence of some special kind of joining of the two.

Notice that (2) implies (1) trivially. We’ll prove that (1) implies (2) if (Y, \mathcal B) is a standard Borel space. But before that, let’s investigate how unique H is.

1. uniqueness of H

If H and H' are two measurable functions satisfying (2), then H=H' on f(\Omega). But notice that we don’t know if the image f(\Omega) is X. We don’t even know if the image is measurable.

If (Y, \mathcal B) is a standard Borel space, then it is known that the set [H=H'] is measurable. In that case, we can say that H=H' holds almost everywhere in the following sense. For the push forward measure \mu (on \mathcal A) of any probability measure on \sigma(f) under f, we have that H=H' holds \mu-a.e. because \mu[H = H'] = 1 because f^{-1}[H=H'] = \Omega.

2. counterexamples

If both \mathcal A and \mathcal B are trivial sigma-algebras, i.e., if \mathcal A = \{\emptyset, X\} and \mathcal B = \{\emptyset, Y\}, then it is easy to find an example where (1) holds but (2) does not. (Exercise 10. find such an example)

With sigma-algebras that separate points, I don’t know any counter-examples but we can at least say that in this case (1) implies g being a (possible non-measurable) function of f.

Exercise 20. Under the assumption that \mathcal B separates points of Y (in the sense that for any two distinct points y,y' there is B \in \mathcal B such that y \in B and y' \not\in B), show first that f(x) = f(x') \implies g(x) = g(x') and then that there is a function H: X \to Y such that g = H \circ f.

3. proof when g has a countable range, with nice Y

Let’s first prove the Doob-Dynkin lemma for the case when the image g(\Omega) is a countable subset in a standard Borel space (Y, \mathcal B).

Suppose y_1, y_2, \dots (finitely many or countably infinite) are all points in the image g(\Omega). For each i, the singleton \{y_i\} is measurable. This follows from the possibly overkill assumption that (Y, \mathcal B) is a standard Borel space and this happens to be the only time we use that assumption in this proof.

Since \{y_i\} is measurable, (1) then implies that there is A_i \in \mathcal A such that g^{-1}y_i = f^{-1}A_i

Notice that f^{-1}A_i form a countable partition. We do not know if A_i form a partition of \Omega. We do not know if f(\Omega) is X or if it is measurable. But at least we can conclude that A_i form a countable partition modulo f(\Omega), by which I mean that A_i \cap f(\Omega) form a partition of f(\Omega). In other words, for each x in the image f(\Omega), there exists unique i such that x \in A_i. (I haven’t checked if we can assume WLOG that f is onto.)

Let H be a measurable function from X to Y such that for each x \in f(\Omega) and for each i we have x \in A_i \implies H(x) = y_i. It is easy to define such H. (Exercise 30. Define such H. Beware that f(\Omega) may not be measurable.)

Now it only remains to show g = H \circ f which we can check point-wise. For each \omega \in \Omega, we have:
there is i for which f(\omega) is in A_i;
for that i, H(f(\omega)) = y_i (by the property we specified for H);
but that y_i = g(\omega) (by how A_i is chosen);
so H\circ f = g on this \omega.

4. proof when Y is the Cantor space

There is a slick proof of the fact that (1) implies (2) when (Y, \mathcal B) = (\{0,1\}^{\mathbb N}, \mathcal B) is the Cantor space along with its Borel sigma-algebra, or equivalently the countably infinite product of the discrete measurable space (\{0,1\}, \mathcal D) (where \mathcal D is just the discrete sigma-algebra for \{0,1\}.)

Let p_i: \{0,1\}^{\mathbb N} \to \{0,1\} be the projection to the i’th coordinate. This is a measurable function. It is easy to see that (1) implies that p_i \circ g = H_i \circ f for some measurable function H_i from X to \{0,1\}. Now define H(x) = (H_1(x), H_2(x), \dots), then H is a measurable function and we have g = H \circ f.

5. proof when Y is the unit interval

Now let’s show that (1) implies (2) when (Y, \mathcal B) is the unit interval along with its Borel sigma-algebra. You may know that the unit interval is isomorphic to the Cantor space (as measurable spaces) (and also to the real line), so showing this is redundant, but let’s do this anyway for illustration purposes.

Let g_n := \lfloor 2^n g \rfloor / 2^n. These are approximations to g. Since we have already proved Doob-Dynkin’s lemma for countable range cases, there is a measurable function H_n: (X, \mathcal A) \to (Y, \mathcal B) with g_n = H_n \circ f.

Notice that g_n \nearrow g. You might want to define H to be \lim H_n, but there is a little problem that we don’t know if the limit exists everywhere. At least we know that H_n converges on f(\Omega). You might now want to define H to take the limit value just on the image f(\Omega), and to take 0 as its value outside the image, but here another problem is we don’t know if this H is measurable: The image f(\Omega) may not be measurable.

There is a way to work around that problem. Just let H be any measurable function from X to Y such that H(x) = \lim H_n(x) holds for all x for which the limit exists. There are at least two ways of taking such an H:

  1. Let H be \lim H_n where the limit exits, and 0 where the limit does not exist; or
  2. Let H be \limsup H_n everywhere.

Now it only remains to show g = H \circ f. For each \omega \in \Omega, we have:
g(\omega) = \lim g_n (\omega) = \lim H_n \circ f (\omega) = H \circ f (\omega).

(Another work around: Define a measurable H such that H = \lim H_n holds on f(\Omega) by letting H = \sup_n H_n everywhere. Then proceed with this H.)

6. proof for standard Borel space case

If (Y, \mathcal B) is a standard Borel space, then it is known that this measurable space is either discrete or isomorphic to the unit interval, but we have proved Doob-Dynkin’s lemma for both cases already.

Posted in Mathematics | Tagged , | Leave a comment

Lisp lists and destructive functions

Goal of this post is to be some small introduction to lists in Lisp. Since functions and macros for manipulating lists can be easily found by searching the Internet, instead of listing all those helpful functions, this post will focus more on making you prepared for terminology and gotchas surrounding Lisp lists. For beginners to be able to use those functions without falling to common mistakes, it is crucial to know the meaning of phases like “modify a list”, “this list shares structure with that list”, “destructive function”, “literal list”.

All Lisp snippets in this post are in elisp, unless stated otherwise. Nevertheless, everything that is said on this post also applies to Common Lisp, not just elisp, unless stated otherwise. To translate some snippets into Common Lisp, see the Common Lisp note at the end. This post assumes the reader have read the previous article about uniform reference semantics and its implications (for container objects and function parameter passing).

Since this post is part of the Living with Emacs Lisp series, I am mainly speaking to elisp beginners. Nevertheless, if you see something that’s elisp-only that’s not explicitly stated as being elisp-only, feel free to point out.

Credits: Chapter 12 of Practical Common Lisp was of huge help for me to eliminate all the confusion I had on this subject some time ago. Large part of this post draws from that chapter.

1. nil and non-nil

The only falsy value in Lisp is the symbol nil. Also, the value of the variable nil is always nil, in other words, the symbol nil evaluates to itself. In case this is confusing, consider running the following code:

(setq aa (+ 1 1))
(setq bb 'ha)

(print aa)
(print bb)

The value of the variable aa is now the number 2, and the value of the variable bb is now the symbol ha. Since the mapping is really from the symbols aa and bb to other objects as mentioned the previous article, this means that the symbol aa now evaluates to the number 2, and the symbol bb now evaluates to the symbol ha. It is in this context that the symbol nil always evaluates to itself. If you try to change that with this code:

(setq nil (+ 1 1))

you get an error message like this in Emacs:

Attempt to set a constant symbol: nil

or in case of a Common Lisp interpreter:

NIL is a constant and thus can't be set.

Since nil is the only false object and everything else is truthy, you will encounter words like non-nil or nil more often than words like true or false or truthy or falsy in articles about Lisp. For example, the elisp docstring for if says:

if is a special form in `eval.c'.

(if COND THEN ELSE...)

If COND yields non-nil, do THEN, else do ELSE...

and that is another way of saying “If COND is true, do THEN, else do ELSE…”

2. cons cell

A cons cell is a Lisp object with two slots; each slot holds, or refers to1, some Lisp object. One slot is known as the car, and the other is known as the cdr.

The following code creates a cons cell that holds 11 and 22, and then illustrates how to access its car and its cdr.

(setq cc (cons (+ 10 1)
               (+ 20 2)))

(print cc) ; ⇒ (11 . 22)

;; the car of cc is 11
(print (car cc)) ; ⇒ 11

;; the cdr of cc is 22
(print (cdr cc)) ; ⇒ 22

The following code then shows how to modify the cons cell.

;; set the car of cc to 1111
(setf (car cc) 1111)

(print cc) ; ⇒ (1111 . 22)

A cons cell is sometimes just called a cons (plural: conses).

3. cons cells in diagrams

Let’s start with this code:

(setq aa (cons (+ 1 1) (+ 2 2)))
(print aa)

(setq bb aa)

(setq the-car (car aa))
(setq the-cdr (cdr aa))

The first line creates a cons holding two numbers 2 and 4. The second line prints it. It is printed as (2 . 4). The last two lines show how to access the components of the cons. The following ASCII art diagram shows what happens:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.        +-------------+
.        |    car: cdr:|
.        |     |    |  |
.        |     |    |  |
.        +-----|----|--+
.              |    |
.          ^   |    |
.          |   v    v
. bb: -----+               
.              2    4      
.                          
.              ^    ^      
.              |    |      
.              |    |      
.        the-car:  the-cdr:

Now we modify the cons by adding 10 to its car and cdr:

(setf (car aa)
      (+ (car aa) 10))

;; another way
(require 'cl-lib)
(cl-incf (cdr aa) 10)

(print aa) ; ⇒ (12 . 14)
(print (car aa)) ; ⇒ 12
(print the-car) ; ⇒ 2

The aftermath in diagram:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.        +-------------+
.        |    car: cdr:|
.        |     |    |  |
.        |     |    |  |
.        +-----|----|--+
.              v    v
.          ^         
.          |   12   14 
. bb: -----+               
.              2    4      
.                          
.              ^    ^      
.              |    |      
.              |    |      
.        the-car:  the-cdr:

Before we get to more complex diagrams, let’s see how we can simply diagrams of cons cells. Above diagram can be simplified as:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.         +------+-----+
.         |      |     |
.         |    | |  |  |
.         +----|-+--|--+
.              v    v
.          ^         
.          |   12   14 
. bb: -----+               
.              2    4      
.                          
.              ^    ^      
.              |    |      
.              |    |      
.        the-car:  the-cdr:

And now let’s draw numbers next to names for further simplification:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.         +------+-----+
.         |      |     |
.         |  12  |  14 |
.         |      |     |
.         +------+-----+
.                     
.          ^          
.          |           
. bb: -----+               
.                          
.                          
.           2         4    
.        the-car:  the-cdr:

A cons cell can hold any type of object. For example, if we further run the following code, then the cons cell aa holds one number and one other cons cell.

(setf (cdr aa) (cons 0 0))

So the diagram is then:

. aa: -----+        
.          |        
.          |
.          v         
.                    
.         +------+-----+     +-----+------+
.         |      |     |     |     |      |
.         |  12  |   ----->  |  0  |  0   |
.         |      |     |     |     |      |
.         +------+-----+     +-----+------+
.                                      
.          ^                           
.          |                           
. bb: -----+

4. lists

The following code makes a list of three elements and assigns it to the variable aa

(setq aa (list 10 11 12))

The above code is actually equivalent to the following code.

(setq aa (cons 10 (cons 11 (cons 12 nil))))

The elisp reference says “Lists in Lisp are not a primitive data type; they are built up from cons cells”. This just means, to quote from Practical Common Lisp, “The key to understanding lists is to understand that they’re largely an illusion built on top of objects that are instances of a more primitive data type. Those simpler objects are pairs of values called cons cells”. (Vectors are primitive data type on the other hand. This is why I kept using vectors in the previous article.)

A list in Lisp is really a (singly) linked list made of cons cells. When we say aa is a list of three elements 10, 11, 12, this actually means the following diagram:

       +-----+-----+   +-----+-----+   +-----+-----+  
       |     |     |   |     |     |   |     |     |  
aa --> |     |   ----> |     |   ----> |     |   ----> nil
       |  |  |     |   |  |  |     |   |  |  |     |  
       +--|--+-----+   +--|--+-----+   +--|--+-----+  
          |               |               |           
          v               v               v           

          10              11              12

When we say the structure of the list aa or list structure of aa, we mean the three cons cells in the above diagram.

Now run the following code.

(setq aa1 (cdr aa))
(setq aa2 (cdr aa1))

aa1 refers to the second cons cell in the diagram. aa2 is the last cons cell. On the other hand, aa1 and aa2 are lists too: aa1 is a list of two elements, and aa2 is a list of one element. What is a list of zero elements? Run the following code.

(setq aa3 (cdr aa2))

aa3 is nil, but also an empty list. nil is how Lisp represents an empty list.

                           aa1             aa2              

                             |               |         aa3
                             v               v          
                                                        |
       +-----+-----+   +-----+-----+   +-----+-----+    v
       |     |     |   |     |     |   |     |     |        
aa --> |     |   ----> |     |   ----> |     |   ----> nil     
       |  |  |     |   |  |  |     |   |  |  |     |  
       +--|--+-----+   +--|--+-----+   +--|--+-----+  
          |               |               |           
          v               v               v           

          10              11              12
(print aa)  ; ⇒ (10 11 12)
(print aa1) ; ⇒ (11 12)
(print aa2) ; ⇒ (12)
(print aa3) ; ⇒ nil

This is how you access elements of a list:

(car (cdr (cdr aa))) ; ⇒ 12


;; alternative way
(elt aa 2)   ; ⇒ 12

;; all elements
(elt aa 0)   ; ⇒ 10
(elt aa 1)   ; ⇒ 11
(elt aa 2)   ; ⇒ 12

5. improper lists and proper lists

The elisp reference says “By convention, the cdr of the last cons cell in a list is nil. We call such a nil-terminated structure a true list” It sounds like there are some kind of improper lists. There are two kinds of improper lists. The first kind, dotted lists, are like usual lists except that the cdr of their last cons cells are not nil.

;; This is a dotted list.
(setq dotted (cons 0 (cons 1 (cons 2 100))))

;; Let's see how it is printed. (This is why it is called a dotted list)
(print dotted) ; ⇒ (0 1 2 . 100)

dotted is NOT a list of four elements 0, 1, 2, 100. dotted is just an improper list.

The second kind of improper lists are circular lists which are like usual lists, except they don’t have their last cons cells because they are circular in the sense illustrated by the following code.

;; this is a proper list for now
(setq circ (list 0 11 22 33 44))

;; this is its last cons cell
(setq last-cons (last circ))

;; this is some cons cell in the middle
(setq mid-cons (cddr circ))

;; we modify the structure of circ to make it a circular list
(setf (cdr last-cons) mid-cons)

Lisp functions that take a list as its argument usually expect you to pass a proper list, but some functions can take improper lists as well. It is good to be aware of improper lists because docstrings of functions may mention them.

6. check for empty list, non-empty list, list

The function listp is used for checking if the argument is a list or not.

(listp (list 1 2 3))    ; ⇒ t
(listp nil)             ; ⇒ t
(listp 100)             ; ⇒ nil
(listp (vector 1 2 3))  ; ⇒ nil
(listp "hello")         ; ⇒ nil

In fact, listp simply check if the argument is either a cons cell or nil. If it is a cons cell, then one can argue that it is a non-empty list (which may be an improper list), and if it is nil, then it is an empty list.

The function consp is used for checking if the argument is a cons cell, and one can argue that this is the same as checking if it is a non-empty list.

(consp (cons 100 200)) ; ⇒ t
(consp 100)            ; ⇒ nil
(consp (list 1 2 3 4)) ; ⇒ t
(consp nil)            ; ⇒ nil

To check if a variable holds an empty list, you simply check if it is nil.

7. sequence

Sometimes the docstring of a function may say it takes a sequence as one of its arguments. A sequence is a supertype of vector, list, and maybe some other types. So if the docstring of a function says it takes a sequence, you can pass a vector or list to it. Examples of such functions include length, elt, copy-sequence.

(length (list 0 1 2))   ; ⇒ 3
(length (vector 0 1 2)) ; ⇒ 3

(elt (list 0 11 22) 1)    ; ⇒ 11
(elt (vector 0 11 22) 1)  ; ⇒ 11

Also, a string is a sequence.

(length "Hello")  ; ⇒ 5
(elt "Hello" 2)   ; ⇒ 108 (is the character l in elisp)
(copy-sequence "Hello")  ; ⇒ "Hello"

8. literal list

What is a literal object? What is a literal list? For that, consider the following code

(list 0 (+ 10 1) (+ 20 2)) ; ⇒ (0 11 22)

;; This is a literal list.
'(0 11 22)   ; ⇒ (0 11 22)

(equal (list 0 (+ 10 1) (+ 20 2))
       '(0 11 22))
;; ⇒ t


;; A literal list should be quoted (with a single quote) in general in code.
(length '(0 11 22))   ; ⇒ 3

;; If you don't quote it, this happens.
(length (0 11 22))    ; ERROR: 0 is not a function.
;; Expecting the above line to evaluate to 3 is like
;; expecting the below line to evaluate to 11.
(length Hello world)  ; ERROR

(length "Hello world") ; ⇒ 11


;; This is a literal cons cell.
'(11 . 22)  ; ⇒ (11 . 22)

(equal (cons 11 22)
       '(11 . 22))
;; ⇒ t

;; These are quoted forms. Quoted forms give you literal objects.
'(11 . 22)
'(0 11 22)
'((0 1) (2 3) (4 5))

;; two ways of writing nested lists
(equal (list (list 0 1) (list 2 3) (list 4 5))
       '((0 1) (2 3) (4 5)))
;; ⇒ t

There are two kinds of literal objects. The first kind comes from writing quoted forms. The second kind comes from writing self-evaluating forms. What are self-evaluating forms?

;; These are self-evaluating forms. They evaluate to themselves.
"Hello world"  ; ⇒ "Hello world"
1234           ; ⇒ 1234


(setq aa 1234)

;; This is NOT a self-evaluating form. This is just a mod form.
(mod 25 10) ; ⇒ 5

;; This is NOT a self-evaluating form. This is just the variable aa.
aa        ; ⇒ 1234

The rule “One should not modify a literal object” means that you should not change any part of a literal object.

;; Assign a literal object to the variable aa.
(setq aa '((0 11) (22 33) (44 55)))

;; Let bb refer to the last element of aa.
(setq bb (elt aa 2)) ; ⇒ (44 55)

;; See what the first element of bb is.
(elt bb 0)  ; ⇒ 44

;; This modifies our literal object. This is NOT ok.
(setf (car bb) 4400)

;; This also modifies our literal object. This is NOT ok.
(setf (cdr bb) nil)

;; the aftermath
aa ⇒ ((0 11) (22 33) (4400))

Some code for illustration

(setq aa '(0 11 22))
(setq bb (list 0 11 22))

;; This modifies a literal object. This is NOT ok.
(setf (elt aa 2) 2200)

;; This is ok.
(setf (elt bb 2) 2200)


(setq aa "foo")
(setq bb (copy-sequence "foo"))

;; This modifies a literal string. This is NOT ok.
(setf (elt aa 0) (elt "m" 0))

;; This is ok.
(setf (elt bb 0) (elt "m" 0))
bb ; ⇒ "moo"

More detail on literal objects will be the subject of another post which I will link as soon I write it, for now it’s enough to know two facts:

  • You should not modify literal data.
  • A literal list is also referred to as “constant list” or “quoted list” (for obvious reasons).

9. shared structure

In the following diagram, we say that the list x and the list y share structure, because they share some tail. Recall that the list structure of x is cons cells making up the list x. The list structure of x and that of y share two cons cells, and that is why we say that the two lists share structure.

      +-----+-----+   +-----+-----+
      |     |     |   |     |     |
x --> |     |   ----> |     |     |  
      |     |     |   |     |  |  |  
      +-----+-----+   +-----+--|--+  
                               |     +-----+-----+   +-----+-----+
                               +---> |     |     |   |     |     |
                                     |     |   ----> |     | nil |
                               +---> |     |     |   |     |     |
                               |     +-----+-----+   +-----+-----+ 
      +-----+-----+   +-----+--|--+                                
      |     |     |   |     |  |  |                                
y --> |     |   ----> |     |     |                                
      |     |     |   |     |     |                                
      +-----+-----+   +-----+-----+

If we modify the list x by doing (setf (elt x 2) something), we change the element of x at index 2 (in the sense of new designation), that is, we change the car of the last-to-second cons cell. And that affects the list y too because the last-to-second cons cells for the list x and that for the list y are the same cons cell.

If we modify the list x by doing2 (setf (cdr (cddr x)) something), we change the cdr of the last-to-second cons cell, and depending on what something is, this can make x a longer list or a shorter list or an improper list. And that affects the list y too.

So doing something to the list x can affect the list y. This is why we sometimes need to watch out for lists that share structure with other lists, and there are many functions that return a list that may share structure with one of its arguments. Notice that (eq x y) being false (i.e. is nil) does NOT guarantee that the two list do not share structure. In the appendix is a function for checking if two lists share structure.

In the following diagram, the list z and the list w do NOT share structure, but they DO share elements. This will remind you of the shallow copy diagram in the previous article. We can say that the list z is a shallow copy of the list w. Often we just say that z is a copy of the list w.

      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
      |     |     |   |     |     |   |     |     |   |     |     |
z --> |     |   ----> |     |   ----> |     |   ----> |     | nil |
      |  |  |     |   |  |  |     |   |  |  |     |   |  |  |     |
      +--|--+-----+   +--|--+-----+   +--|--+-----+   +--|--+-----+
         |               |               |               |         
         v               v               v               v         

         10              11              12              13        

         ^               ^               ^               ^         
         |               |               |               |         
      +--|--+-----+   +--|--+-----+   +--|--+-----+   +--|--+-----+
      |  |  |     |   |  |  |     |   |  |  |     |   |  |  |     |
w --> |     |   ----> |     |   ----> |     |   ----> |     | nil |
      |     |     |   |     |     |   |     |     |   |     |     |
      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+

Doing (setf (elt z 2) something) or (setf (cdr (cddr z)) something) affects only z and not w. This is unlike the case with x and y. Nevertheless, the two lists share elements, so if some of the elements were mutable objects, and if you mutate one of those mutable elements, then the change would been seen via both z and w. But you don’t have to worry about that often, because you usually expect and want that change to be seen both lists.

The following code first makes two lists that share structure in the way that matches our diagram for x and y, and then modifies the first list to see what happens.

(setq tail (list 22 33))

(setq x (cons 0 (cons 1 tail)))
(setq y (cons 2 (cons 3 tail)))

x  ; ⇒ (0 1 22 33)
y  ; ⇒ (2 3 22 33)

(setf (elt x 2) 2200)

x  ; ⇒ (0 1 2200 33)
y  ; ⇒ (2 3 2200 33)


(setf (cdr (cddr x)) nil)

x  ; ⇒ (0 1 2200)
y  ; ⇒ (2 3 2200)

The following code first makes the list z and w in the way that matches our diagram for z and w, and then modifies the first list to see what happens

(setq z (list 10 11 12 13))
(setq w (copy-sequence z))

z  ; ⇒ (10 11 12 13)
w  ; ⇒ (10 11 12 13)

(setf (elt z 2) 1200)

z  ; ⇒ (10 11 1200 13)
w  ; ⇒ (10 11 12 13)

(setf (cdr (cddr z)) nil)

z  ; ⇒ (10 11 1200)
w  ; ⇒ (10 11 12 13)

Since vectors are primitive data type unlike lists, a vector cannot share structure only partially with another vector. Two vectors are either the same object or completely distinct: there is no in-between. The need to watch out for shared structure between lists is a consequence of lists being an illusion. Another consequence is the need to assign back to the variable holding the list sometimes, which is the subject of the next two sections.

In the appendix is some tips for figuring out the correct diagram given a snippet where you are not sure which list share structure with which.

10. what it means to modify a list

The previous section gave examples of modifying a list and its effects on another list that happens to share structure with the former list. Here is an example of NOT modifying a list.

(setq aa (list 0 11 22))

aa  ; ⇒ (0 11 22)

;; Insert 10000 to aa.
(push 10000 aa)

aa  ; ⇒ (10000 0 11 22)

;; Insert 20000 to aa.
(push 20000 aa)

aa  ; ⇒ (20000 10000 0 11 22)

;; Pop out the first element from aa.
(pop aa) ; ⇒ 20000

aa  ; ⇒ (10000 0 11 22)

push and pop are treating aa like a stack structure. Nevertheless, no list is being modified in the above code. In fact, what is happening is analogous to the use of incf in the previous article. push and pop are macros and not functions, and the code (push 10000 aa) just expands to the code (setq aa (cons 10000 aa)), and the code (pop aa) expands to something like:

(let ((val (car aa)))
  (setq aa (cdr aa))
  val)

In the same sense that incf does NOT modify any numbers (which are immutable anyway), the following code does NOT modify any list.

;; Let's start with a literal list this time.
(setq aa '(0 11 22))

aa  ; ⇒ (0 11 22)

(setq aa (cons 10000 aa))

aa  ; ⇒ (10000 0 11 22)

(setq aa (cons 20000 aa))

aa  ; ⇒ (20000 10000 0 11 22)

(setq aa (cdr aa))

aa  ; ⇒ (10000 0 11 22)

Nevertheless, we say that we are adding 10000 to the list stored in aa by running (setq aa (cons 10000 aa)). Here is the docstring of push:

push is a Lisp macro in `subr.el'.

(push NEWELT PLACE)

Add NEWELT to the list stored in the generalized variable PLACE.
This is morally equivalent to (setf PLACE (cons NEWELT PLACE)),

Notice the phrases “generalized variable”, “place” being used in the docstring. Use of such phrases implies that the push macro, just like incf and setf, can be used with not just variables (like aa) but also with places like so:

(setq vv (vector (list 0 0 0)
                 (list 1 1 1)))

;; Push 9 into the PLACE (elt vv 0).
(push 9 (elt vv 0))

vv   ; ⇒ [(9 0 0 0) (1 1 1)]

The code (push 9 (elt vv 0)) just expands to an appropriate code. I mention this because this is why a push form CAN expand to some code that modifies a list, and the same is true with pop. For example, the following code shows that a pop form can expand to some code that modifies a list.

(setq aa (list 0 11 22))

(pop (cdr aa))  ; ⇒ 11

aa   ; ⇒ (0 22)

(pop (cdr aa)) expands to some code that mutates the first cons cell in an appropriate way. This is why push and pop are convenient: that pop form just magically expands to the right code and you don’t have to write the right code yourself. On the other hand, this also means that just because you are using pop and push does not mean that no list is being modified.

(setq aa (list 0 11 22))
(setq bb aa)

;; The following line will NOT affect bb, as it does not modify any list.
(push 10000 aa)

aa  ; ⇒ (10000 0 11 22)
bb  ; ⇒ (0 11 22)

;; Remember that the list aa and the list bb still share structure.
;; That is why the following line WILL affect bb.
(setf (elt aa 2) 9)

aa  ; ⇒ (10000 0 9 22)
bb  ; ⇒ (0 9 22)

Now you how to add an element to a list using push and what that actually means. Just in case you are still not sure what’s going on, here is the diagram representing the state right BEFORE running the line (push 10000 aa) in the above code:

                      +-----+-----+   +-----+-----+   +-----+-----+
                      |     |     |   |     |     |   |     |     |
aa  -------------->   |  0  |   ----> | 11  |   ----> | 22  | nil |
                      |     |     |   |     |     |   |     |     |
                      +-----+-----+   +-----+-----+   +-----+-----+

                            ^
                            |
bb -------------------------+

And here is the diagram for the state right after running that line:

      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+
      |     |     |   |     |     |   |     |     |   |     |     |
aa -> |10000|   ----> |  0  |   ----> | 11  |   ----> | 22  | nil |
      |     |     |   |     |     |   |     |     |   |     |     |
      +-----+-----+   +-----+-----+   +-----+-----+   +-----+-----+

                            ^
                            |
bb -------------------------+

Emacs users might raise a question: what is the difference between add-to-list and push and how can the function (not a macro) add-to-list do what it does? That will be the subject of another post which I will link as soon as I write it.

11. destructive functions and non-destructive functions

In this section, I will explain the notion of destructive functions, how to use them and when to use them.

Many functions for manipulating lists come in pairs of destructive function and non-destructive function. For example, the functions reverse and nreverse are such a pair, and they are both for reversing a list. reverse is the non-destructive one and nreverse is the destructive one. The following table shows many of such pairs. Functions in the last two rows don’t come in pairs but I included them in the table as a note that those functions are destructive.

non-destructive destructive what the pair is for
reverse nreverse reverse a list
append nconc concatenate two lists or more
remove delete remove elements from a list
cl-substitute cl-nsubstitute replace elements in a list
N/A sort sort a list
N/A cl-merge merge two lists

I believe the pair remove and delete is the most general example, hence the best example to illustrate my points, so let’s start with that pair.

Example use of remove (obtaining a list with the number 5 removed):

(setq arg (list 5 1 2 5))
arg  ; ⇒ (5 1 2 5)
(setq result (remove 5 arg))
result ; ⇒ (1 2)
arg    ; ⇒ (5 1 2 5)

Example use of delete:

(setq arg (list 5 1 2 5))
arg  ; ⇒ (5 1 2 5)
(setq result (delete 5 arg))
result ; ⇒ (1 2)

That is how these two functions are supposed to be used. None of the two functions make it so that the variable arg refers to the result you want. Instead, the return value is the result. Let’s call it Property P1.

(Property P1) Its return value is the result list.

Property P1 is shared by both functions. The following property on the other hand is only guaranteed for the non-destructive function remove and not for the destructive function delete:

(Property P2) The original list that is passed to the function (as its argument) is NOT modified.

That is why remove is called a non-destructive function: it preserves the list structure. No list gets modified when you use remove.

The function delete on the other hand may modify the original list. If you use this function, any other list that happens to share structure with the original list may get modified as well. Hence the term “destructive”. Sometimes we say “the function delete is allowed to destroy the structure of the original list.” and that just means that it may modify the list, and the reason we can say the strong word “destroy” instead of “modify” in that sentence is that the original list as we knew it before is no more.

So the destructive function delete is a dangerous function in some sense, but it’s more efficient than the non-destructive function remove. In general, it’s best to use remove in your code, and if profiling shows you need to optimize, that is when you replace the use of remove with the destructive delete but only if you are sure that the argument list isn’t referenced from anywhere else.

Finally, there is something to watch out for even in the case of the non-destructive remove: the following property is NOT guaranteed for remove or delete.

(Property P3) The result list (the return value) does NOT share structure with the list passed to the function.

In other words, the return value of remove may share structure with its argument.

We’ve mentioned that the following property does NOT hold for remove or delete.

(Property P4) It makes arg become the result.

A common mistake is to assume that P4 holds for the destructive delete or to assume that P3 holds for the non-destructive remove.

Now that delete and remove are explained, I’d add that there are also functions cl-remove-if, cl-remove-if-not, cl-delete-if, cl-delete-if-not.

Here are some basic things to know about the functions mentioned in our table:

  • Every function in the table has Property P1, regardless of whether it is destructive or not.
  • Every function in the first column has Property P2. None of the functions in the second column has that property. A common mistake is to assume that the function sort is non-destructive and has Property P2.
  • Most functions in the table (even the ones in the first column) do NOT guarantee Property P3. The only exception I know is reverse.
  • It is impossible to write a version of nconc, delete, or cl-merge in a way that guarantees P4 (for the same reason that incf cannot be a function.)
  • For most functions in the second column, what happens to the argument is not specified, so you shouldn’t rely on its effects on the argument. (Practical Common Lisp points out two exceptions: nconc and nsubstitute)

Some care need to be taken when you use an operation that result in a list sharing structure and another operation that is destructive. For example

(setq aa (list 5 3 1))
(setq bb (list 4 2 0))

;; Concatenate.
(setq cc (append aa bb))

;; Sort cc.
(setq cc (sort cc '<))

cc  ; ⇒ (0 1 2 3 4 5)

;; Let's see if there was any nasty side effect.
aa  ; ⇒ (5 3 1)
bb  ; ⇒ (4 5)  <----- YMMV

To see why bb got destroyed, recall:

  • The function append is on our table. Its return value may share structure with some of its arguments (in particular, its last argument).3
  • The function sort is on the second column. It is destructive.

If you don’t want bb to be destroyed, you should use sort with copy-sequence like this:

(setq cc (sort (copy-sequence cc) '<))

Think of the idiom (sort (copy-sequence ...) ...) as the non-destructive counterpart to sort. For example, if you want to print all paths in the elisp global variable load-path in a sorted way into the messages buffer, you run:

(dolist (path (sort (copy-sequence load-path) 'string<))
  (message "%s" path))

But if you use sort directly without copy-sequence, the load-path will get messed up.

Some popular examples of safe use of destructive functions in the second column is included in the appendix.

Actually the phrase “this operator is destructive” can have broader meaning than this post touched upon. In fact, from the Common Lisp HyperSpect glossary:
“destructive adj. (of an operator) capable of modifying some program-visible aspect of one or more objects that are either explicit arguments to the operator or that can be obtained directly or indirectly from the global environment by the operator. ”

So setq is a destructive operator in that sense. For this reason, functions on the second column are sometimes called recycling functions (because they are allowed to recycle cons cells of the given list).4

12. Do this post apply to alists and trees as well?

There are other useful illusion data types other than lists. Two of them are:

  • alist (association list)
  • tree

And they are also made of cons cells.

Here is an example of an alist. It is just a list of cons cells, to be interpreted as storing information about which names are male names and which names are not (0 indicates female name, and 1 indicates male name.).

(setq my-alist (list (cons "Alice" 0)
                     (cons "Bob" 1)
                     (cons "Dan" 1)))

my-alist ; ⇒ (("Alice" . 0) ("Bob" . 1) ("Dan" . 1))

When we say the alist structure of my-alist, we mean the six cons cells making up that alist. But when we say the list structure of my-list, we mean the three cons cells making up the list structure. So when we say the alist my-alist share structure with some other alist, we mean that some of the six cons cells are shared with the other alist. But when we say that the list my-alist share structure with some other list, we mean the list structure (made of three cons cells) being shared.

Let’s see the docstring of copy-alist

(copy-alist ALIST)

Return a copy of ALIST.
This is an alist which represents the same mapping from objects to objects,
but does not share the alist structure with ALIST.
The objects mapped (cars and cdrs of elements of the alist)
are shared, however.

So (copy-alist my-alist) will return an alist composed of six new cons cells, while (copy-sequence my-alist) or (cl-copy-list my-alist) will produce only three new cons cells, while the other three cons cells are shared because they are considered as elements of my-alist as as a list. Elements of my-alist as a list are the three cons cells being used for associations, but elements of my-alist as an alist are the three strings and the associated numbers. So “copy of an alist” and “copy of a list” end up meaning different things.

We’ve only covered functions for manipulating lists, but there are also functions for manipulating alists, and some of them come in pairs. Now that you know the meaning of “alist structure”, “modifying an alist”, “alist structure being shared”, you will be able to use those functions safely as well.

Finally, when we say “tree structure”, we mean all cons cells reachable from following car or cdr. So when you have a list of two elements and if the first element is another list with three numbers, and the second element is yet another list with ten strings, then you have a nested list, and the tree structure consists of 2 + 3 + 10 cons cells. You will be able to guess what the function copy-tree does.

13. Common Lisp note

To translate elisp snippets in this post into Common Lisp snippets, change some function names: cl-foo into foo, and copy-sequence into copy-seq. You might also want to wrap the snipets with appropriate let forms (so that aa and bb and so on are local variables), or use defparameter and the earmuff convention.

14. Appendix

One example. This example from Practical Common Lisp uses nreverse in a safe way.

(defun upto (max)
  (let ((result nil))
    (dotimes (i max)
      (push i result))
    (nreverse result)))

(upto 10)   ; ==> (0 1 2 3 4 5 6 7 8 9)

Here is a function for testing shared structure between lists.

(defun my-list-share-structure-p (list1 list2)
  "Return non-nil if the two lists share structure, otherwise nil. Assume non-cyclic."
  (and (consp list1)
       (consp list2)
       (eq (last list1)
           (last list2))))

Test:

(let ((lst (list (list 3 1) (list 4 1) (list 5 9))))
  (list (my-list-share-structure-p lst lst)
        (my-list-share-structure-p lst (cdr lst))
        (my-list-share-structure-p lst (copy-sequence lst))
        (my-list-share-structure-p lst nil)
        (my-list-share-structure-p lst (cons 99 lst))
        (my-list-share-structure-p lst (append (list 99 99) lst))))

Given a snippet, if you are not sure what diagram it should produce, you can use eq to test whether two given cons cells are the same cons cells. If you want to see the diagram at once, use the variable print-circle (for Common Lisp it’s *PRINT-CIRCLE*):

(setq aa (list 0 11 22))
(setq bb (list 1 2))
(setq cc (append aa bb))

(let ((print-circle t)
      (print-level nil)
      (print-length nil))
  (print (vector aa bb cc)))

It would be good to have something like Online Python Tutor – Visualize program execution for Lisp.

Footnotes:

1

Recall from the previous article that container objects refer to other objects

2

Yes this is equivalent to (setf (cdr (cdr (cdr x))) something) and also to (setf (cl-cdddr x) something)

3

As long as the last argument is not nil, sharing structure with the last argument is guaranteed. And only the last argument.

4

Started by Practical Common Lisp

Posted in Emacs, Lisp | Tagged , | Leave a comment

Names and things (reference semantics) vs. boxes and things (value semantics)

The goal of this post is to explain the “names and things” semantics (reference semantics) that is shared by many dynamic languages (including Python and Lisp in particular that this post singles out for illustration) and its implications.

There are many great posts about this topic written by others. What this post is trying to achieve then is to contribute an article that:

  • (1) illustrates with many examples of Python code and Lisp code side by side (numbered for easy reference);
  • (2) tries to start with minimal common prior background and finish with explanations for many related side questions and gotchas;
  • (3) lots of ASCII art diagrams

Note: You don’t need to be familiar with Lisp to understand this post (part of (2)). You may skip Lisp examples (but I want to add that it’s not hard to read Lisp code.) You don’t need to be familiar with Python either.

Unless stated otherwise, Python examples are written in Python 3 and Lisp examples in Elisp (Emacs Lisp). (There will be a Common Lisp note at the end.)

Before we begin, I’d like to thank /r/learnpython (and /u/zahlman, /u/sqeekypotato, /u/wololongong) for their feedback on a smaller post that this post is based on. Feedback on this post is welcome too.

1. Motivation

Some questions on stackoverflow.com indicate that there are some beginners who find the way Python variables work mysterious, resulting in the question “Is Python pass by reference? Is it pass by value?” and other similar questions popping up in their mind. Questions that came to my mind as well long time ago. Similarly for elisp beginners.

Understanding the way reference semantics shows itself in the two languages is essential for understanding the following things:

  • immutable objects vs. mutable objects, and what is object identity, and shallow copy vs. deep copy.
  • why swapping operation (on integers) in general cannot be written as a function (in Python and in Lisp) (which contrasts with C++).
  • why Lisp macros setq, setf, incf and push cannot be written as functions.
  • how to manipulate lists in Lisp

Let’s start with a litmus test. The following is a Python snippet (and Lisp snippet) that first assigns the vector [10,11,12] to the variable aa, then defines a function, then calls the function on aa and then prints it. (The array-like data structure is called lists in Python, and vectors in Emacs Lisp. They will always be called vectors in this post to simplify this post.1) The function foo has parameter bb (then bb is a local variable within the function body), it assigns the sum of 90 and 9 to be the first element of bb, and then it assigns 9999 to the local variable bb.

Snippet 10: (first in Python)

aa = [10, 11, 12]

def foo(bb):
    bb[0] = 90 + 9
    bb = 9999

foo(aa)

print(aa)  # => [99, 11, 12]

(now in Lisp)

(setq aa (vector 10 11 12))

(defun foo (bb)
  (setf (elt bb 0) (+ 90 9))
  (setq bb 9999))

(foo aa)

(print aa)  ; => [99 11 12]

Notice that the output is neither 9999 nor [10, 11, 12]. This post is mainly for those who find this output mysterious. Before explaining it, I’d like to point out two things. First, this “mysterious” behavior isn’t unique to just function calls. In fact, if you inline the function call manually, you get the same result, and this means that the thing goes deeper than the issue of “call by value” or “call by reference”. The following is like Snippet 10, except the function called is inlined away.

Snippet 20:

aa = [10, 11, 12]

bb = aa

bb[0] = 90 + 9
bb = 9999

print(aa)  # => [99, 11, 12]

(setq aa (vector 10 11 12))

(setq bb aa)

(setf (elt bb 0) (+ 90 9))
(setq bb 9999)

(print aa)  ; => [99 11 12]

Second, this behavior isn’t unique to Python and Lisp, in fact, this behavior is shared by many other (but maybe not all) dynamic languages, which is a good news because once you get an understanding of this behavior, you can reuse that understanding in those other languages as well. Compile your understanding once, run anywhere. For example, in my case, when I was learning elisp, I remember reading an article about Python variables and it helped me with my confusion about Lisp variables at that time.

2. Alfie Bobbie thought experiment

Suppose you are an art teacher and you are to give a series of homework assignments to a student, say John, over a week. On Monday, you ask John to make a penguin doll and then name it Alfie. What does it mean to name a doll Alfie? It means that for example when you say “Where is Alfie?” to John, John will point his finger at that penguin doll.

Next day, you give John another task, which is “Name Alfie Bobby.” In other words, give Alfie yet another name: Bobby. So that doll now has two nicknames: Alfie and Bobby.

Next day, you give John the third task: make Bobby three-eyed. Then next day, John’s task is to make another doll, this time, a dolphin doll, and then name it Bobby, in other words, the nickname Bobby is now assigned to the dolphin doll and not to the now three-eyed penguin doll. The penguin doll no longer has Bobby as one of its nicknames. Next day, you ask John to describe Alfie. How will John describe Alfie, out of the following three options?

(1) “Alfie is a dolphin doll.”

(2) “Alfie is a three-eyed penguin doll.”

(3) “Alfie is a regular penguin doll with two eyes”

Of course, only (2) is correct. The point of this thought experiment is that you don’t find the outcome of this experiment mysterious at all, you knew almost instantly that (2) is the right answer, but did you realize that this thought experiment is actually analogous to Snippet 20?

3. The aa bb experiment (back to Snippet 20)

Here is a way of relating the thought experiment to Snippet 20. (I’ll also explain Snippet 20 with diagrams later)

In Python, there are names (i.e. nicknames like aa or bb) and there are things (i.e. objects2 such as integers, vectors3, strings, and so on.), and names refer to things.

Imagine you are to give orders (by coding) to a Python interpreter, which we’ll call John. Let’s say your code asks John to make a vector and then name it aa. This corresponds to the first line aa = [10, 11, 12] in Snippet 20. The right hand side in that line is asking “make a vector with these elements” and the line itself is asking “then name it aa“. Next, you give John another task, which is “name aa bb”, which corresponds to the line bb = aa. In other words, give aa yet another name: bb. In other words, give the object with nickname aa yet another nickname: bb. So that vector object now has two names: aa and bb.

Next, you give John the third task: make bb three-eyed. This is the line bb[0] = 90 + 9, if you define three-eyed vectors to be those vectors whose first element is 99. Next, you ask John to make some other thing (this time an object that happens to be an integer) and then name it bb. This is the line bb = 9999. The name bb is now assigned to the number 9999 and not to the now three-eyed vector. The vector no longer has bb as one of its nicknames. Next, you ask John to describe aa. This is the line print(aa). How will John describe aa?

(1) 9999

(2) [99, 11, 12]

(3) [10, 11, 12]

You can see that the answer has to be and is (2). This “names referring to things” semantics contrasts with the “boxes containing things” semantics of the C language where there are boxes (unmovable boxes) and things, and boxes contain things. In the former semantics, the assignment operator simply assigns a nickname (which could already be a nickname of some object) to a Python object (which could already have a nickname or even many nicknames) that is the result of the computation of the expression on the right hand side. (The result of the computation can be a newly created object or an already existing object, depending on what the expression on the right side is asking for.) In the latter semantics, the assignment operator simply copies the thing (that is the result of the computation of the expression on the right hand side) into the specified box (specified by the left hand side). We’ll get into more details of “boxes containing things” semantics later, but for now we focus on “names referring to things” semantics of Python and Lisp.

Now that the litmus snippet is explained, to complete the picture, I’d like to also add that the “names referring to things” paradigm extends to the behavior of container objects, local variables, function calls in a consistent and simple way: Container objects (such as vectors) refer to the objects they conceptually contain. In other words, they refer to other objects instead of physically containing them. (In some of the next sections, we’ll go through another litmus snippet testing this point.) With local variables and function arguments, you have locally effective names referring to things.

In the “Boxes containing things” semantics on the other hand, container objects (such as structs in C) physically contain boxes that physically contain other objects. With local variables and function arguments, you have temporary boxes containing things.

4. Boxes containing object references

There is another way of looking at the “names referring to things” semantics, which explains the term “reference semantics”. One can think of the “names and things” semantics as a special variation on the “boxes and things” semantics where the things that get stored in boxes are always pointers/references to other objects, rather than objects themselves.4 So in this point of view, one can say that the assignment operator in reference semantics just copies references to objects, rather than objects themselves, and to complete the picture, one can add that container objects contain references to other objects, rather than the objects themselves.

Make sure that you can convert between the following two (effectively equivalent) point of views on reference semantics:

  • “There are nicknames and there are objects. Nicknames refer to objects.”
  • “There are boxes and there are objects. Boxes contain object references, i.e., references to objects.”

In the rest of the article, we will only use the terms “reference semantics” and “value semantics”.

Let’s summarize our observations thus far into two sentences: In reference semantics, objects are shared by default. In value semantics, objects are copied by default.

While I am invoking a particular specific interpretation of the phrases “value semantics” and “reference semantics”, it is possible that there are other possible interpretations of these phrases you might have in mind. In that case, just replace occurrences of “value semantics” and “reference semantics” in this post with “the C-like semantics” and “the Python-or-Lisp-like semantics” while reading this post, but nevertheless feel free to share other possible interpretations in comments.

5. Two different meanings of change

The litmus example demonstrates how the phrase “change something” can have two very different meanings. In one of Disney movies, a teenage female protagonist goes to a wizard. The protagonist says “I wish my mother would change”. The wizard says “All right. I’ll make that happen.” and the next day the protagonist sees her mother transform into a bear. In another movie, the protagonist, who is again a teenage daughter to follow the “Alice in wonderland” trope, finds a portal to another world. In the other world, she finds a flat identical to her own, and two people who look like her parents, except they have black buttons where their eyes should be. The protagonist’s initial reaction to this other world is “I like this new place. I like these new parents. I am living here from now on.”

In both movies, one could say that the protagonist’s mother changes. In the former movie, change was in the sense of transformation. In the latter movie, change was in the sense of new designation.

When you see a car with a bumper sticker that says “change your mayor” and if it was near an election day, it means change in the sense of new designation. The bumper sticker is saying “let’s vote for someone else this time”. But if it wasn’t near any election day, and if the driver was an influential lobbyist or adviser and if the bumper sticker said “I change your mayor”, it means “I change the mayor’s policies. I make him change his mind..” That’s change of mayor in the sense of transformation.

When you change your mayor in the sense of transformation, a lot of people can say “My mayor changed.” and it affects a lot of people. But when you change your mayor in the sense of new designation, by moving to another city for example, that does not affect a lot of people. No side effect. Recall Snippet 10. We change bb first in the sense of transformation, and that affects aa, and then we change bb in the sense of new designation, and that does not affect aa.

If you want to unambiguously say that some code changes an object in the sense of transformation, you can say the code mutates the object, you can also say the code changes the state of the object.

If you want to unambiguously say that some code changes something in the sense of new designation, you can use the verb “set”. Example:

  • “this code sets the element at index 2 of this vector to …”
  • “this code sets aa[2] to …”
  • “CDR is SETFable.”

The two different meanings are sometimes also phrased in the following ways, depending on literature:

  • “There is a difference between changing the binding and changing the object”
  • “There is a difference between modifying the variable and modifying the object”
  • “changing the variable vs changing the object”
  • “modifying the place vs modifying the object”

6. Pass by value or pass by reference

Now that we have clear idea on what is going on in Snippet 10, let’s get back to the questions that mention phrases like:

  • pass by value
  • pass by reference
  • call by sharing
  • pass by object
  • or any of the above phrases where “pass” is replaced with “call” or vice versa

Answers to these questions depend on what the askers meant by these phrases5. I wish to point out just two things:

  1. What we see in Snippet 10’s outcome is nothing like “C++’s pass by reference”.
  2. When someone says “Python is pass-by-value except values are object references”, “values” in that sentence means contents of boxes (rather than states of Python objects), and with that interpretation, the sentence is correct. This is good to know because “Language X is pass-by-value except …” is a popular saying.
  3. Wikipedia article on evaluation strategy seems to suggest that “call by sharing” is the term for this that is used in the Python community. Now you know what is meant when someone says “Language X uses call by sharing.”

7. With vectors

Let’s start drawing diagrams, with simple examples first.

In the following code, the first line creates a vector that holds three integers. 10 is its first element, 11 is its second element and 12 is its third element. The last three lines in the code show how to access the three elements.

Snippet 30:

aa = [10, 11, 12]
bb = aa
a0 = aa[0]
a1 = aa[1]
a2 = aa[2]
(setq aa (vector 10 11 12))
(setq bb aa)
(setq a0 (elt aa 0))
(setq a1 (elt aa 1))
(setq a2 (elt aa 2))

Execution of the second line results in the names aa and bb referring to or pointing to the same vector, or in another point of view, aa and bb are references to the same object. Aftermath of execution of the snippet is shown in the following arrow diagram.

aa: ---+
       |  
       |  
       v

   +--------------------+
   |       0:   1:   2: |
   |                    |
   |       |    |    |  |
   +-------|----|----|--+
           |    |    | 
       ^   v    v    v 
       |               
       |   10   11   12
bb: ---+               
           ^    ^    ^ 
           |    |    | 
           |    |    | 

          a0:  a1:  a2:

The rectangle in the diagram represents the vector object. The diagram is showing one vector object, three integer objects, five names and their relations. The vector now has two names aa and bb. The vector says “10 is my first element, 11 is my second element, 12 is my third element” or rather “10 is my element at index 0, 11 is my element at index 1, 12 is my element at index 2.” In the diagram, I just wrote 0:, 1:, and 2: instead of writing “element at index 0″ and so on. The last three lines in the code give names a0, a1, a2 to the three elements. See how the integer 10 has two arrows pointing to it because there are at least two different ways to refer to it, for example, you can refer to it by saying a0, and also by saying “first element of aa”.

The fact that the variables aa and bb refer to just one vector and not two vectors is usually phrased in many ways, such as:

  • “aa and bb are the same object
  • “… are the same vector
  • “… are same under object identity

Now you know what object identity is. You can test object identity with the function eq in Lisp, or with the is operator in Python. If you run (eq aa bb) in Lisp (after running of Snippet 30), it will return t (which means true), which confirms that aa and bb are the same object. If you run aa is bb in Python, it will return True, which confirms the same.

Now suppose we run the following code right after we ran Snippet 30.

Snippet 40:

bb[0] = 90 + 9
bb = 9999
print(aa)
(setf (elt bb 0) (+ 90 9))
(setq bb 9999)
(print aa)

The first two lines do something to bb, and the last line prints aa, not bb. What is the value of aa now? A vector of three elements: 99, 11, 12. Did two things to bb, but only one affected the value of aa. Let’s see the aftermath in diagram:

aa:---+
      |  
      |  
      v

  +--------------------+
  |       0:   1:   2: |
  |                    |
  |       |    |    |  |
  +-------|----|----|--+
          |    |    |
   99 <---+    |    |
               v    v   

          10   11   12  
bb:--+ 
     |    ^    ^    ^
     |    |    |    |
     v    |    |    |

   9999  a0:  a1:  a2:

The first line in the code mutated or modified the vector by reassigning something different to its first element field. When the vector is thinking “my element at index 0…”, that now refers to a different number, which is 99. The name a0 still refers to the original number, which is 10. So a vector can be mutated. We say that a vector is a mutable object because its state can change, as demonstrated here. Integers are immutable objects: their states cannot change.

The second line in the code reassigns the name bb to a different thing, which is 9999. The name aa still refers to the original thing, which is the vector.

We did two things to bb: the first line did mutation of bb, the second line did reassignment of the name bb. Result of mutating bb is visible through the variable aa as well and that’s because aa and bb at that time were the same object. On the other hand, reassignment of the name bb did not have any effect on what you see through the variable aa and that’s not surprising. These points may seem unremarkable but what if we try another example code:

Snippet 50:

aa = [10, 11, 12]

def dothings(bb):
    bb[0] = 90 + 9
    bb = 9999

dothings(aa)

print(aa)
(setq aa (vector 10 11 12))

(defun do-things (bb)
  (setf (elt bb 0) (+ 90 9))
  (setq bb 9999))

(do-things aa)

(print aa)

Now what is the value of aa? A vector of three numbers: 99, 11, 12. By now, this result shouldn’t be surprising. This is actually the same as Snippet 10.

8. With integers, simplest immutable objects

Before we move on to diagrams involving nested vectors, let’s talk about integers. Integers are immutable objects and so when you have the object that is 10, that is, when you have the integer object whose state is “I represent the integer 10″, the object just stays that way, you can’t cause the object to become 11 or 15 or anything else.

The following computes the sum of 1 and 1, assigns the result to aa, then assigns it to bb.

Snippet 60:

aa = 1 + 1
bb = aa
(setq aa (+ 1 1))
(setq bb aa)

Aftermath of execution of the code in diagram follows. (We’ll soon get to the minor issue of possibility of this diagram and next diagrams not always being correct)

aa: ---+
       |
       |
       v

       2

       ^
       |
       |
bb: ---+

First line in the code connects the name aa to 2, then the second line connects the name bb to the same object.

Now suppose we run the following code right after we ran the previous snippet:

Snippet 70:

bb = bb + 10
print(aa)
print(bb)
(setq bb (+ bb 10))
(print aa)
(print bb)

What get printed for values of aa and bb? 2 and 12. The name bb now refers to a new number that was the result of evaluating the expression bb + 10 or (+ bb 10). (Recall what the assignment operator in reference semantics does.) In diagram:

aa: ---+
       |
       |
       v

       2     12

             ^
             |
             |
bb: ---------+

The name aa is still pointing to the original number that aa and bb together used to point to.

Can one write a function that takes a number and then adds 10 to that number? To test, what would happen if we run the following?

Snippet 80:

aa = 1 + 1
def dosomething(bb):
    bb = bb + 10
dosomething(aa)
print(aa)
(setq aa (+ 1 1))
(defun do-something (bb)
  (setq bb (+ bb 10)))
(do-something aa)
(print aa)

What gets printed? The result is 2 and it’s not 12. This example just repeats the previous snippet but with a function call. When you call the function dosomething by passing aa, the interpreter first makes the local variable bb refer to whatever aa was referring to at the time, then the first line in the function body makes bb refer to a new number. Value of aa after that is still 2.

Now some might say “What about the += operator in Python or incf in Lisp? How can it do what it does?” I’ll demonstrate what it does, how it does what it does, and what it doesn’t do, with the following code.

Snippet 90:

aa = 1 + 1
bb = aa
bb += 10
print(aa)
print(bb)
(require 'cl-lib)

(setq aa (+ 1 1))
(setq bb aa)
(cl-incf bb 10)
(print aa)
(print bb)

Output is 2 and 12.

What happens is that whenever Python interpreter encounters this statement

bb += 10

it replaces the statement with the new statement bb = bb + 10 and runs that new statement instead.6 Likewise, whenever Emacs encounters the expression (cl-incf bb 10), it replaces the expression with the new expression (setq bb (+ bb 10)) and evaluates that new expression instead. This is possible because cl-incf is a Lisp macro, rather than a Lisp function.

Whatever you do to bb with your code, it does not affect aa or any other variables that refer to integers or other immutable objects. That’s the thing with immutable objects. So you can rely on “numbers next to names” diagrams instead for convenience, by which I mean diagrams that doesn’t have arrows pointing to numbers, you just draw numbers next to the variable names. For example, you can draw this diagram

aa: 2    



bb: 2

rather than the diagram where aa and bb points to the same 2, and you can also rely on this diagram

aa: 2


bb: 12

rather than on the diagram where aa points to 2 and bb points to 12. They don’t look that much convenient for now because these diagrams are very simple. We will meet complicated diagrams soon.

Notice that for immutable objects, there can be no external difference between reference semantics and value semantics. That is unless you decide to check object identity between immutable things, but you wouldn’t do that (see the post on identities of immutable objects). Because it is assumed that we wouldn’t do that, some interpreters might choose to use value semantics for some small immutable objects for performance reasons, much like how we can rely on “numbers next to names” diagrams.

9. Nested structures, shallow copy, deep copy

I’ll demonstrate what shallow copies are, using diagrams. The following code creates a vector of three elements, and the first element happens to be a vector. So we have a nested structure (a vector containing a vector). The second line in the code makes a shallow copy of aa.

Snippet 100:

aa = [[0, 0], 11, 12]
cc = aa[:]
(setq aa (vector (vector 0 0) 11 12))
(setq cc (copy-sequence aa))

Now aa and cc are different objects, not the same object. But aa and cc share their elements. Structure is copied, but elements are not copied, they are just shared.

          +------------+
aa: --->  | 0:  1:  2: |
          | |   |   |  |
          +-|---|---|--+
            |   v   v   
+------+    |           
|0: 1: | <--+  11   12  
||  |  | <---+        
+|--|--+     |  ^   ^ 
 v  v        |  |   | 
 0  0     +--|--|---|--+
          |  |  |   |  |
cc: --->  | 0:  1:  2: |
          +------------+

That’s what a shallow copy is, at least in Python. Let me quote a paragraph from S.Lott, from a stackoverflow.com thread: “Shallow copies duplicate as little as possible. A shallow copy of a collection is a copy of the collection structure, not the elements. With a shallow copy, two collections now share the individual elements.”

So the vectors aa and cc are different under object identity. The first element of aa and the first element of cc are same under object identity. The second element of aa and the… wait, let’s recall that we are not supposed to check object identity between integers.

This relation between aa and cc is usually expressed in many ways, such as:

  • “cc is a shallow copy of aa”
  • “cc is a copy of aa, except their elements are shared.”
  • “copy-sequence returns a copy of a vector, but their elements are not copied.”

Diagrams are getting complex, so let’s invoke the “draw numbers next to names” trick, and redraw the previous diagram like this:

          +------------+
aa: --->  | 0:  1:  2: |
          | |   11  12 |
          +-|----------+
+------+    |           
|0: 1: | <--+           
|0  0  | <---+        
+------+     |        
          +--|---------+
          |  |  11  12 |
cc: --->  | 0:  1:  2: |
          +------------+

Now what happens if we run the following code right after we run the previous code? In the following code, we do something to the first element of cc, and then we print the first element of aa, not of cc:

Snippet 110:

cc[0][1] = 90 + 9
cc[0] = 9999
print(aa[0])
(setf (elt (elt cc 0) 1) (+ 90 9))
(setf (elt cc 0) 9999)
(print (elt aa 0))

The first element of aa is now a vector of 0 and 99. The aftermath in diagram:

          +------------+
aa: --->  | 0:  1:  2: |
          | |   11  12 |
          +-|----------+
+------+    |           
|0: 1: | <--+           
|0  99 |     +--> 9999 
+------+     |        
          +--|---------+
          |  |  11  12 |
cc: --->  | 0:  1:  2: |
          +------------+

We did two things to the first element of cc. The first line of code did mutation of first element of cc. Mutating that element had an effect on first element of aa and that’s not surprising because first element of aa and that of cc were the same object at that time. On the other hand, the second line, the assignment to the first element of cc had no effect on the first element of aa, and that’s not surprising either. Yes, this is our first litmus example in disguise. To relate to it, notice that the expression cc[0] in the last snippet is playing the same role as the expression bb in Snippet 10 (the litmus example).

As for difference between shallow copy and deep copy, Here is a relevant quote from the Python documentation:

The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances):

  • A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in the original.
  • A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original

The following shows another way of making a shallow copy.

Snippet 115:

aa = [[0, 0], 11, 12]
cc = [0, 0, 0]

cc[0] = aa[0]
cc[1] = aa[1]
cc[2] = aa[2]
(setq aa (vector (vector 0 0) 11 12))
(setq cc (vector 0 0 0))

(setf (elt cc 0) (elt aa 0))
(setf (elt cc 1) (elt aa 1))
(setf (elt cc 2) (elt aa 2))

And here is yet another way of making a shallow copy.

Snippet 116:

aa = [[0, 0], 11, 12]
cc = [aa[0], aa[1], aa[2]]
(setq aa (vector (vector 0 0) 11 12))
(setq cc (vector (elt aa 0)
                 (elt aa 1)
                 (elt aa 2)))

These two alternative ways of making a shallow copy are more verbose. I only included them here for illustration.

As for making a deep copy, you have to define a recursive function or borrow one from a relevant library.

10. assignment to an expression

This section is about something that is already demonstrated implicitly in some of previous sections. In this section, I want to make it explicit. Have I mentioned what is supposed to happen when we run something like the following code?

Snippet 117:

xx[i+j][0][k+1] = f(a, b, c) + g(e, f)
(setf (elt (elt (elt xx (+ i j)) 0)
           (1+ k))
      (+ (f a b c)
         (g e f)))

What is guaranteed to happen is that the expression on the right hand side (f(a, b, c) + g(e, f) in the Python snippet or (+ (f a b c) (g e f)) in the Lisp snippet) is evaluated. Let’s call the result of the evaluation the r-object, for lack of a better term.

What technically does NOT happen is the evaluation of the expression on the left hand side (xx[i+j][0][k+1] in the Python snippet or (elt (elt (elt xx (+ i j)) 0) (1+ k)) in the Lisp snippet). Nevertheless, let’s call the result of the evaluation of the left hand side the l-object.

As demonstrated many times in earlier sections, the change of the l-object is not in the sense of transformation, but in the sense of new designation. The l-object is never mutated.

In terms of arrow diagrams, what happens is that some arrow pointing to the l-object changes its direction to point to the r-object. There could have been many arrows pointing to the l-object if you drew the diagram in a more full context encompassing more code, but only one arrow changes its direction.

While the l-object is not mutated, the parent object (the object you get by evaluating the sub-expression xx[i+j][0]) IS mutated.

11. value semantics, C, elephants

Since we talked a lot about reference semantics, let’s talk a little bit about value semantics. The C language uses value semantics. This section does NOT assume familiarity with C, so this section might work as a tiny intro to C as well, and this is the only way I can explain value semantics in more detail. Recall that in value semantics, there are boxes and there are things and boxes contain things. C is a compiled language and is statistically typed, so when you declare an integer variable like this

int aa;

the variable aa can only hold integers. You can think of this as being there a box labeled aa, and that box can contain only values of int type.

Now consider this C code.

Snippet 120:

int i = 123;
float f = 3.14;
i = i + i;
i = f + f;

Above code declares an int variable i and a float variable f. Initially the value of i is 123 and the value of f is 3.14. The third line computes the expression i + i and the result int value 246 is copied into the box labeled i.

The last line is interesting. It is specifying an int variable on the left side, but the expression on the right hand side gives a float, and remember that an int variable can only hold an int value. So what happens? When you try to assign a value to a variable and the types are different, one of the following things happens:

  • the compiler gives an error or warning, or
  • the program comes up with a new value of right type (related to the original value) and assigns it instead.

Now consider this C code.

Snippet 130:

#include <stdio.h>

struct Point {
   int x;
   int y;
};


int main(void) {
  struct Point aa;
  aa.x = 10;
  aa.y = 11;
  struct Point bb = aa;

  bb.x = 90 + 9;

  printf("aa.x equals %d\n", aa.x);
  printf("bb.x equals %d\n", bb.x);

  return 0;
}

The line struct Point aa; declares a variable of type Point. This data type is composed of two int fields. So you can imagine a box labeled aa, which is composed of two smaller boxes, and the two small boxes can only hold int values, while the big box can only hold Point values. The next two lines initializes aa to have 10 and 11 as its elements, and the next line struct Point bb = aa;, which declares bb but also assigns aa to bb, copies contents of aa and writes into the box labeled bb, because that’s what the assignment operator does in value semantics. So you get:

      aa
+------+-------+
|      |       |
|  10  |  11   |
|      |       |
+------+-------+


      bb    
+------+-------+
|      |       |
|  10  |  11   |
|      |       |
|      |       |
+------+-------+

and then the line bb.x = 90 + 9; does something to bb and you get:

      aa
+------+-------+
|      |       |
|  10  |  11   |
|      |       |
+------+-------+


      bb    
+------+-------+
|      |       |
|  99  |  11   |
|      |       |
|      |       |
+------+-------+

Notice how that did not affect aa. This contrasts with results of Snippet 10 or Snippet 20 in reference semantics.

In languages with value semantics, the word “variable” usually means “box” in our metaphor, and “value” usually means contents of a box. So, while this post keeps saying “box”, you should say “variable” if you are posting a question in a C forum.

If you are new to C, you are going to have a question at this point. These boxes have fixed sizes according to their types, but then how does one work with something like vector objects which can contain arbitrary many other objects of arbitrary types? What is the size of a box for a vector object? What if the vector has so many elements that it cannot be contained within that box?

It all comes down to this: how do you store an elephant in a freezer when you are given a freezer that is smaller than that elephant? You don’t. You work around. You don’t store a big object into a box of small size. Instead, you store some kind of reference to that big object into a box that is designed to store references. This is where pointer variables in C come in. They store memory addresses. The question “how do I do vectors stuff in C” is a very good question because its full answer teaches you not only about pointers, but also about double pointers, but I will not go into that. I only wanted to assure you that value semantics is not less powerful.

That’s it for value semantics, but since you are now being introduced to C, I feel I need to say more about C. In C, a box is just some contiguous block of memory. The metaphor “box” (or rather the abstract notion of “variable”) is in the compiler’s mind. A box can be on the stack or on the heap. What are the stack and the heap? C snippets in this post so far involve only the stack. Construction and destruction of boxes on the stack are done automatically so that you don’t have to manually write C code saying something like “destroy this box” for this. The stack is the memory set aside for this automatic business. It’s called the stack because if you try to guess how this automatic business is done, something resembling the stack data structure would be the first thing that comes to your mind. For elephants stuff, you need to write C code saying something like “I order a box of size n bytes” or “I’m done using that box. Now destroy the box, i.e., release that block of memory”, and the heap is the memory set aside for elephants that you might create. This is good to know even if you don’t plan to learn C, because some useful articles on Python invoke the concept of the heap and stack frames: recall that reference semantics can be considered a special case of value semantics.

12. For those coming from C

Make sure you can convert between the following two (effectively equivalent) pictures:

  • “Everything is a pointer in Python”
  • “There is no pointer in Python.”

You know what each of the two statements means. The second statement at first can make you feel restrained (not being able to work with addresses directly), but many of the tasks that you usually do with double pointers in C can be done in a language with reference semantics.

Languages that use reference semantics usually use a garbage collector so that objects live as long as necessary. Remembering this and the fact that there are no boxes can be helpful when trying to understand closures later (if the language supports closures).

Many times you may have to recall that integers are immutable objects (in Python and Lisp). Any data type that is not a container type is highly likely to be an immutable type in these languages.

13. copying vs. sharing

Now you know that copying is possible in reference semantics and sharing is possible in value semantics, you might be wondering when to copy vs when to share.

The main benefit of copying is that copies are independent from each other. When cc is a deep copy of aa, whatever you do to cc does not affect aa. When reasoning about aa, you don’t have to worry about what other’s code might do to cc. (This is also the benefit of immutable objects.)

The main benefit of sharing, other than less memory usage, is the fact that when cc and aa refer to the same object, mutation of cc is automatically announced to user of aa instantly without you manually having to notify to the user of aa. Similarly, when cc[0] and aa[0] refer to the same object (which is the case if cc is a shallow copy of aa), mutation of cc[0] is automatically visible to aa without you having to do some update operation on aa. When a new mayor is elected, it takes newspapers and tv and internet to get everyone notified of that fact. But when an adviser changes the mayor, every resident instantly has an updated mayor. without having to be notified of anything. And that is the convenience of mutating a shared mutable object.

14. Next sections

Some of next sections are specific to Python, and some are specific to Lisp. If you are reading this post as part of learning Lisp, feel free to skip the sections specific to Python. And vice versa.

15. Python specific

15.1. The plus equal operator in Python

There is some gotchas related to the augmented assignment operator += in Python, which can be summarized as:

  • The plus equal operator does different things depending on whether it’s working on immutable objects or mutable objects.
  • Strings are immutable in Python (unlike in Emacs Lisp).

Result of the following code is not surprising now.

Snippet 140:

ii = 100
jj = ii

jj += 50

print(ii)  # => 100
print(jj)  # => 150

But the result of the following code may be surprising.

Snippet 150:

mm = [7, 8]
nn = mm

nn += [4, 1]

print(mm)  # => [7, 8, 4, 1]
print(nn)  # => [7, 8, 4, 1]

The twist is that while the statement jj += 50 expands to jj = jj + 50, the statement nn += [4, 1] does NOT expand to nn = nn + [4,1], but to some function call that takes nn and mutates it. So the += operator treats mutable targets and immutable targets differently. For more on this, look into iadd in Python documentation.

15.2. Python tuples, immutable or mutable?

In the following code, we assign a tuple object, which is a kind of container object, to the variable tt and then we try to assign something as its first element.

Snippet 160:

tt = (10, 11, 12)
tt[0] = 90 + 9

But running that code results in the following error:

TypeError: 'tuple' object does not support item assignment

Such operation is just not allowed with tuple objects. On the other hand, the following code runs fine.

Snippet 170:

tt = ([0, 0], 11, 12)
tt[0][1] = 90 + 9

print(tt)  # => ([0, 99], 11, 12)

So we are allowed to change elements of a tuple in the sense of transformation, but changing its elements in the sense of new designation is not allowed. This raises a question, what category do tuple objects belong to, among immutable vs mutable? Python documentation’s position is that tuple objects should be considered immutable objects. They are platypuses of immutable objects. The general statement “mammals don’t lay eggs” has exceptions, and some of my statements about immutable objects in this post may have exceptions.

15.3. Mutable default arguments gotcha in Python

Since two of the three main gotchas in Python are already mentioned, I’ll mention the third one too: Default arguments in Python and Lisp.

16. Lisp specific

16.1. setf vs. setq

If you are new to Lisp, reading some of Lisp snippets in this post, a question might have come to your mind: what is the difference between setq and setf? When to use which?

The main difference is that

(setf (elt bb 0) (+ 90 9))

works but

(setq (elt bb 0) (+ 90 9))

does not work.

On the other hand

(setq aa (+ 1 1))

works and

(setf aa (+ 1 1))

also works.

You can always use setf in place of setq. Why am I using setq in some places then? Habit and/or tradition.

16.2. Lisp symbols

It is important to know that the “names referring to objects” picture is an approximation, in case of Lisp. The real picture is “symbols referring to objects”. Also, symbols are objects too in Lisp. When you run the following code, the symbol aa gets associated with 99.

(setq aa (+ 90 9))

So, symbol objects can play the role of Lisp variables, but they can be used for other roles as well which I will not mention. They are multipurpose.

The following docstring for setq makes clear that symbol objects are really being used for the role of variables:

setq is a special form in `C source code'.

(setq [SYM VAL]...)

Set each SYM to the value of its VAL.
The symbols SYM are variables; they are literal (not evaluated).
The values VAL are expressions; they are evaluated.
Thus, (setq x (1+ y)) sets `x' to the value of `(1+ y)'.

One usually says that the print name of the symbol aa is “aa”, or simply that the name of the symbol is “aa”.

Usually, there is only one symbol object of the same print name. But that’s not always the case. There are exceptions. You don’t need to know about these exceptions until you learn to define Lisp macros, which is when you get to use functions like make-symbol or gensym to create these exceptional symbol objects (i.e. uninterned symbols) which are guaranteed to be distinct from whatever symbols you happen to use in your Lisp code.

Common Lisp note: In case of Common Lisp, there is only one symbol object of the same print name in the same package.

And one more thing to complete the picture. Emacs Lisp and Common Lisp are Lisp-2. Being a Lisp-2 just means, to quote Wikipedia, “The namespace for function names is separate from the namespace for data variables.”, or to quote from Peter McLain in a stackoverflow thread , “Lisp2 has (at least) two namespaces (symbols have a slot for their a function value and one for a regular value).”

16.3. Do not modify a constant

Speaking of immutable objects and mutable objects, I need to mention that in Lisp there are certain cases where you should not modify some objects even though those objects are mutable. In other words, you should treat them as if they are immutable. What are these cases? First, we need to know what literal objects are.

CLHS Glossary defines “literal” (as in “literal objects”) as follows:

adj. (of an object) referenced directly in a program rather than being computed by the program; that is, appearing as data in a quote form, or, if the object is a self-evaluating object, appearing as unquoted data.

Snippet 180:

;;; a literal vector
(setq vv [10 20 30])

;;; a computed vector
(setq vv2 (vector 10 20 30))

;;; a literal string
(setq ss "hello world")

;;; a literal list (also a quote form)
(setq ll '(10 20 30))

;;; a computed list
(setq ll2 (list 10 20 30))

The variables vv, ss, ll refer to literal objects, but vv2 and ll2 don’t.

So the rule is that you should not modify literal objects. In other words, you should not mutate any part of literal data. Instead, you should treat them as constant data for your program. Hence the phrase “do not modify a constant”. Since it is understood that we would treat them as constants, the compiler (or even the interpreter) may sometimes try to optimize code for example by collapsing constants (or parts of them) appearing in code if they are structurally equal (i.e. if they are same under the function equal). And that in turn is why you should not modify a literal object. It’s a back and forth relation.

You know you are not supposed to check object identity on immutable objects. And you need to treat literal objects as if they are immutable objects. As a result, you are not supposed to check object identity between parts of literal objects.

16.4. Common Lisp note

Everything in this post, unless stated otherwise, applies to Common Lisp as well, but you might need to change some function names in some elisp snippets in this post to make them Common Lisp code. For example cl-incf to incf, and copy-sequence to copy-seq.

Do NOT setf or setq to undeclared variables in Common Lisp. Lisp snippets in this post are violating that advice, if thought of as stand alone code.

If you are interested in more differences, see differences between Common Lisp and Emacs Lisp

16.5. Further reading for Emacs users and Lisp beginners

This post is part of the Living with Emacs Lisp series. The next post to read the post on how to manipulate lists in Lisp.

Footnotes:

1

The reason the simplification is done this way rather than the other way around is because of Lisp lists which are different creatures altogether

2

In Python, even integers are objects. This is due to how the Python documentation defines the term “object”. This post adopts that usage of that term, for simplicity.

3

Recall that we (incorrectly) call Python lists “vectors” for simplicity in this post.

4

plus enabling of some garbarge collector

5

except for phrases like “call by sharing” and “pass by object references” which seems to have only one possible interpretation

6

I am simplifying a bit here. This will be addressed later.

Posted in Emacs, Lisp, Python | Tagged , , | 1 Comment