immutable objects and object identity

The goal of this post is to explain the following rule:

“A user of a programming language is not supposed to think about object identity of an immutable object (in most cases)”

To do that, I should start by explaining the words “objects” and “object identity”.

1. Meaning of object in Python, Lisp, JavaScript

One day, an Elisp beginner or a Python beginner starts to wonder: “how does Python (resp. Lisp) functions pass things around? is it pass by value?”. After some googling, he/she soon starts asking other questions: what is an object? What is a value? Are strings objects? Are numbers objects? Are integers values? Answers to these questions depend on what one means by the words “object” and “value”.

You may be reading a book on Common Lisp or Emacs Lisp and encounter the following sentence: “numbers are objects and they …”. You may be reading a book on JavaScript and encounter this sentence: “Numbers are not objects. Instead, they are …” You might then be temped to imagine some fundamental difference in how Lisp and JavaScript treat numbers, but actually it might be that authors of the two books are using different definition of “object”.

Let’s go through how the specs for Common Lisp, JavaScript, Python use the words “object” (and “value”) in different ways.

1.1. in JavaScript

In the JavaScript spec, one does not call 123 an object, but you can call it a primitive value. What can you call an object? Anything that is a member of the type Object.

aa = {b: 2, c: 3}; // this is an object.
console.log(aa);
console.log(aa instanceof Object); // -> true

aa = 123; // the spec would not call this an object.
console.log(aa);
console.log(aa instanceof Object); // -> false

123 is a number value. On the other hand, you can create an object that is a wrapper around a number value. Such an object is called a number object.

aa = new Number(123); // this is a wrapper around a number value
console.log(123 === aa.valueOf()); // -> true
console.log(123 === aa); // -> false
console.log(aa instanceof Object); // -> true
console.log(typeof aa == typeof ({a:2, b:3})); // -> true

aa = Number(123); // just to see what happens when I forget to write 'new'.
console.log(typeof 123 === typeof aa); // -> true

To check out JavaScript definitions of the words “primitive value”, “object”, “number value”, “number object”, see ECMAScript Language specification.

1.2. in Python

In Python Language Reference, even strings and numbers are called objects. Everything is an object. Strings and numbers are immutable objects.

Also, Python Language Reference uses the word “value” in some special way. To explain that, let’s see some Python snippet:

aa = [2, "yeah"]
bb = aa
bb[0] = 4
print aa, bb # -> [4, 'yeah'] [4, 'yeah']
bb = [7, 7]
print aa, bb # -> [4, 'yeah'] [7, 7]

Execution of the first line of the code above makes the name aa mean or refer to an object that is a Python list. The second line makes the name bb refer to that same object. The third line mutates the object. One says that the third line changes the value of the object. So “value” is a word one uses to mean the state of a (mutable) object, in Python Language Reference.

The 5th line, bb = [7,7], makes the name bb refer to another object.

To verify the ways the words “object” and “value” are used in Python Language Reference, see Python data model.

1.3. In Common Lisp

As with Python, one calls everything an object. Numbers are immutable objects. Strings are mutable objects this time.

The spec provides the following example sentence: “The function cons creates an object which refers to two other objects“.

(setq aa (cons 2 "yeah"))
(print aa) ;; -> (2 . "yeah")

Evaluation (i.e. execution) of the first line makes the name aa refer to an object that is returned by the cons function call. The function created a container object that refers to two objects (one of them is a number and the other is a string) and the name aa now refers to that container object simply because that’s what the function returned.

123 is an object. In particular, it is a number. So one can say that 123 is a numeric kind of object, or simply, a number object. Contrast this with how the phrase “number object” is used in the JavaScript spec. Same contrast with another phase “string object”.

What is the meaning of “value” in Lisp speak?

;; sum of 1 and 1
(+ 1 1) ;; -> 2

One says that the value of the expression (+ 1 1) is 2. Also, one can say that the value of the expression (cons 2 "yeah") is (2 . "yeah"). One can also say that the value of aa is (2 . "yeah"). This is how the word “value” is used.

reference: CLHS glossary

Just in case you are curious about what’s the equivalent Lisp code for the Python code example:

(setq aa (cons 2 "yeah"))
(setq bb aa)
(setf (elt bb 0) 4)
(print (list aa bb)) ; -> ((4 . "yeah") (4 . "yeah"))
(setq bb (cons 7 7))
(print (list aa bb)) ; -> ((4 . "yeah") (7 . 7))

1.4. in Emacs Lisp

Meanings of “object” and “value” in Emacs Lisp are essentially the same as in Common Lisp. One can check this by following the entries “object” and “value of expression” from elisp reference index.

1.5. value

Perhaps don’t be too concerned about sticking with just the one spec-approved formal definition of “value” when you are writing an article about for example Python. If readers can clearly see what you meant by “value” in a sentence you wrote in an article, and if there is no way for readers to misinterpret the sentence, then all is well.

1.6. JS objects

On the other hand, let’s not call JavaScript strings immutable objects. They are immutable, but don’t call them objects. Stick with the spec on this. When a JavaScript book is saying that a JavaScript array is an object, it is saying that an array is really a member of type Object and that has some surprising consequences, but that’s another story.

2. immutable objects and object identity

2.1. what is object identity?

In the following Python code and Lisp code

aa = [None, None]
bb = [None, None]
print aa == bb # -> True
print aa is bb # -> False
cc = bb
print cc is bb # -> True

(setq aa (cons nil nil))
(setq bb (cons nil nil))
(print (equal aa bb)) ;; true
(print (eq aa bb)) ;; false
(setq cc bb)
(print (eq cc bb)) ;; true

Here, names aa and bb refer to two distinct objects, i.e., you could say that the two objects (but with same contents of course) have different addresses in memory. On the other hand, names cc and bb refer to same object, i.e., they refer to just one object. You can actually verify these by mutating bb and then printing aa and cc. The is operator in Python and the eq function in Lisp also tell you such, and actually that’s the whole point of is and eq. One says that aa and bb have different (object) identities and bb and cc have same identity. Clark Kent is Superman. Superman is Clark Kent. Clark Kent and Superman have the same identity. The name “Superman” refers to the same person the name “Clark Kent” refers to. The name cc refers to the same object the name bb refers to.

There are many situations where you have to keep in mind of object identity when it comes to mutable objects. When Clark Kent changes his nationality to China, Superman becomes a citizen of China, but Clone Kent (a clone of Clark Kent created by a scientist some time ago) remains American. Likewise, when you mutate bb (for example, change its first element), cc changes, but aa doesn’t.

If this part is confusing, the following link contains a more comprehensive explanation

2.2. object identity and immutable objects

On the other hand, in everyday coding, there is no need to think about object identity of anything immutable. Immutable objects don’t change, so no need. For example, can you imagine a situation where a programmer has to use the is operator or the eq function on immutable objects such as numbers (rather than using the usual equality operator like the == in Python or the equal in Lisp)? There is no such situation. Some implementations exploit this fact to improve performance by doing some things to immutable objects: they may sometimes make copies of an immutable object (i.e. creating more immutable objects with same state) and use them instead behind the scenes when your code didn’t say to copy it, and sometimes they may reuse the same immutable object over and over behind the scenes when your code didn’t say to reuse it. (The latter is related to something called interning.) This means that if you actually try to use is or eq over immutable objects, you may see some surprising behavior:

Python

aa = 123
bb = 120 + 3
print aa is bb # -> True

aa = 100000000000 + 1
bb = 100000000000 + 1
print aa is bb # -> False

# it may work differently for you

Common Lisp

(let ((aa 123)
      (bb (+ 120 3)))
  (eq aa bb))
;; That can be true or false depending on implementation.

(let ((x 5))
  (eq x x))
;; Even that can be true or false depending on implementation.
;; http://www.lispworks.com/documentation/lw51/CLHS/Body/f_eq.htm

There is no point in trying to make sense of outputs of above code. A user of a language is just not supposed to gaze into identities of immutable objects. Some might even say that not having to gaze into such is the entire point of having immutable objects.

3. optional reading

3.1. in terms of pointers/references

For those who know C or similar language.

This section is here to repeat the previous section’s point in a different way. Let’s start with a question. Let’s say Bob asks you this: Is Python or Lisp immutable objects pass-by-value or pass-by-reference? Are they references?

Bob’s question raises some interesting thought experiment. But first, you know how in some books about C, the variables are usually visualized as boxes.

int num1;
int num2;
int *ip;

num1 = 20;
num2 = num1;
ip = &num1;

The C int variable num1 is like a box of certain size just big enough to store a C integer in it. The variable num2 is another box also able to store a C integer. We store 20 into the first box. Then we copy from the num1 box and paste into the num2 box. The pointer variable ip is also like a box of certain size, this time just big enough to store a memory address. We store the address of the num1 box into the ip box.

If one were to come up with a simple way to implement Python or Lisp variables in terms of C variables, it would be like this phrase: “pass by value, but values are object references.” In other words, values that go into the boxes (on the stack) should be references to objects that are on the heap. Put in another way, values (that are put into the boxes) are pointers and they point to things on the heap. As a result, addresses (references) are what gets passed around by the assignment operator (and hence also by function argument passing). This implementation can replicate the behavior you would expect for Python and Lisp variables at least when it comes to mutable objects. Let’s call this Implementation 1.

As for immutable objects, you could deviate from Implementation 1, and decide to just store them right there in the boxes, rather then storing their addresses. Let’s call this Implementation 2. (You can’t really push a very large bignum into a box of fixed size, but let’s just assume we can forget that technicality for sake of simplicity of discussion.)

In Implementation 1, mutable objects and immutable objects are passed around by references. In Implementation 2, mutable objects are by references while immutable objects are passed around by values (in the sense of values put into the boxes).

Here is a thought experiment. Alice collects some code snippets and then run them using each of the two implementations. Will Alice get different results depending on implementation?

Each of the two implementations has a conceptual simplicity to it. The only difference a user of the implementation get to see is in how identities of immutable objects are handled, but then the user is not supposed to test object identity of immutable things anyway, so both implementations are acceptable. They are practically interchangeable. A real implementation is likely to be a result of mixing Implementation 1 and Implementation 2 together in a certain way that maximizes performance. In fact, Python code example in the previous section shows that Python on my system is indeed a mix.

This entry was posted in JavaScript, Lisp, Python and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s