default argument in Python and Lisp

The goal of this post is to go through:

  • how to pass default argument in Emacs Lisp (and Common Lisp)
  • why the rule “do not use mutable objects as default arguments” in Python
  • difference in how Python and Lisp interpreters treat default arguments

Every Lisp code snippet on this post is for Emacs Lisp, unless stated otherwise.

1. the Python mutable default argument gotcha

Here we define a Python function foo that takes two arguments aa and bb and they are optional and we have specified defaults arguments (the lists [0,0] and [0,0]), and so we call them without arguments twice:

def foo(aa = [0,0], bb = [0,0]):
    print aa, bb
    aa[0] = 7
    bb = [7,7]

foo()   # prints [0, 0] [0, 0] 
foo()   # prints [7, 0] [0, 0]

Does that output make sense? This is the gotcha.

Before we dive into what’s going on, since I want to compare this with Lisp, let’s see how one specify default arguments to optional parameters in Lisp functions.

2. how to give default arguments in Lisp

Here is how in Emacs Lisp.

(require 'cl-lib) ; for cl-defun macro

(cl-defun my-hello (aa bb &optional (cc 100) (dd (+ 100 100)))
  (list aa bb cc dd))

(my-hello 3 5)      ; ⇒ (3 5 100 200)
(my-hello 3 5 7)    ; ⇒ (3 5 7 200)
(my-hello 3 5 7 9)  ; ⇒ (3 5 7 9)

In other words, we use the cl-defun macro to define function my-hello which takes two mandatory arguments and two optional arguments, and we specify the default for cc to be 100, and the default for dd to be the sum of 100 and 100. (In Common Lisp, the usual defun macro can be used in this way.)

3. comparing Lisp and Python

We will define function bar and then call it twice. The function bar takes one optional argument and when we write a definition of this function, we will write an expression (for default argument) that calls another function cow. We will see when the function cow is called.

Python code:

def cow():
    print 'moo'
    return 'cow'

print 'defining bar'

def bar(cc = cow()):
    print 'bar'

print 'calling bar'


Lisp code:

(defun cow ()
  (print "moo")

(print "defining bar")

(cl-defun bar (&optional
               (cc (cow)))
  (print "bar"))

(print "calling bar")


Python output:

defining bar
calling bar

Emacs Lisp output:

"defining bar"
"calling bar"

In case of Python, the expression cow() was calculated when the function bar was defined, not when bar was called. In case of Lisp, it’s the other way around, i.e., the expression (cow) was evaluated every time bar was called. The Lisp behavior is the expected behavior and the Python behavior is the surprising one and it does surprise many Python beginners. As far as I know, this is a Python-only gotcha. (On the other hand, the rule “don’t modify literal data” that Lisp users follow is related to a Lisp-only gotcha that is somewhat similar, but that’s another story)

So, this is why some Python users say you should not use mutable objects as default arguments (unless you know what you are doing).

4. the None trick and the nil trick

So in Python, some people use the None trick: specifying None as default arguments and then calculating the intended defaults when the function is called. Example:

def hello(aa, bb, cc = None, dd = None):
    if cc is None:
        cc = 100
    if dd is None:
        dd = 100 + 100
    print aa, bb, cc, dd

hello(3, 5)       # prints 3 5 100 200
hello(3, 5, 7)    # prints 3 5 7 200
hello(3, 5, 7, 9) # prints 3 5 7 9

hello(3, 5, None) # prints 3 5 100 200
hello(3, 5, None, 9) # 3 5 100 9

That example is a bit silly in that it is using the None trick for numbers which are immutable, nonetheless you get to see how the None trick is used. The statement hello(3, 5, None) printing 3 5 100 200 rather than 3 5 None 200 may or may not be what you want depending on situation.

Similarly in Emacs Lisp, the nil trick:

(defun my-hello (aa bb &optional cc dd)
  (if (eq cc nil)
      (setq cc 100))
  (if (eq dd nil)
      (setq dd (+ 100 100)))
  (list aa bb cc dd))

(my-hello 3 5)      ; ⇒ (3 5 100 200)
(my-hello 3 5 7)    ; ⇒ (3 5 7 200)
(my-hello 3 5 7 9)  ; ⇒ (3 5 7 9)

(my-hello 3 5 nil)   ; ⇒ (3 5 100 200)
(my-hello 3 5 nil 9) ; ⇒ (3 5 100 9)

Why would someone want to use the nil trick when Lisp has no default argument gotcha? Maybe if you don’t want to depend on the cl-lib library. It’s good to be aware of the nil trick anyway because you might need to read code that use the trick. As with Python, the expression (my-hello 3 5 nil) returning (3 5 100 200) rather than (3 5 nil 200) may or may not be what you want.

This expression

(if (eq cc nil)
    (setq cc 100))

can be alternatively written as

(if (null cc)
    (setq cc 100))

and because nil is the only false value in Emacs Lisp (and also in Common Lisp), that in turn can be alternatively written as

(unless cc
  (setq cc 100))

which then in turn can be written as

(setq cc (or cc 100))

and in fact, the last expression is probably what you would encounter most often in uses of the nil trick. (Equivalence of these four expressions apply to Common Lisp as well.)

On the other hand, None is not the only false value in Python, so you might want to think twice before you decide to use

if not cc:
    cc = 100

or this

cc = cc or 100

when you are using the None trick.

5. further reading

6. optional reading

6.1. more on the Python gotcha

Let’s recall the Python code we started with.

def foo(aa = [0,0], bb = [0,0]):
    print aa, bb
    aa[0] = 7
    bb = [7,7]

foo()   # prints [0, 0] [0, 0] 
foo()   # prints [7, 0] [0, 0]

Here’s Lisp code for comparison in case you are curious. (you may see something surprising if you write [0 0] in place of (vector 0 0) in the following code, but as I said earlier, that’ll be another story about literal data in Lisp.)

(cl-defun foo (&optional
               (aa (vector 0 0))
               (bb (vector 0 0)))
  (print (vector aa bb))
  (setf (elt aa 0) 7)
  (setf bb (vector 7 7)))

(foo)   ;; prints [[0 0] [0 0]]
(foo)   ;; prints [[0 0] [0 0]]

Let’s get back to Python. What happened when we called foo two times in Python code? When the Python interpreter was running the definition of foo, it calculated the first expression [0,0] (resulting in a Python list object), and it calculated the second expression [0,0] resulting in another list object. These two list objects are not the same object. In what sense? In the sense that your twin brother is not you. Let’s give these two list objects some nicknames. Let’s call the first list object Alice and the second object Bob.

When we called foo the first time, the name aa got assigned to Alice, and bb to Bob, and then aa and bb were printed, and then the statement aa[0] = 7 changed Alice’s state (because aa is Alice), and then the statement bb = [7,7] reassigned the name bb to another list object. Then the names aa and bb got expired as we exit the function body.

When we called foo the second time, the name aa got assigned to Alice, and bb to Bob as usual, and then aa and bb were printed, but this time Alice was in a different state.

6.2. the gotcha will get you

You might think that as long as you don’t do modify aa for example, you can get away with using mutable defaults in Python. Maybe, but as your code grow, you may end up modifying the object that the name aa refers to, through other names. For example, take a look at the class method example in this article.

6.3. it will get you again

Even with immutable objects, you need to be aware of the gotcha.

import random

def fizz(x = random.random()):
    print x

(require 'cl-lib)

(cl-defun my-fizz (&optional (x (cl-random 1.0)))
  (print x))



Posted in Emacs, Lisp, Python | Tagged , , | Leave a comment

emacs lisp and static variable

This post is part of the Living with Emacs Lisp series and is about the following question.

How does one emulate static variables (as in C), in elisp?

1. for a toggle function

If the reason you want something like a static variable is for a toggle command that turns something on and off, then you might want to look into defining a minor mode using define-minor-mode.

2. simply using global variables

Alternatively, you can just use a global variable of the same name as the function. In fact, this is how define-minor-mode does it.

As an example, we define a Fibonacci number generator.

(defvar my-fibo-gen (cons 0 1))
(defun my-fibo-gen ()
  (cl-psetf (car my-fibo-gen) (cdr my-fibo-gen)
            (cdr my-fibo-gen) (+ (car my-fibo-gen)
                                 (cdr my-fibo-gen)))
  (car my-fibo-gen))

(my-fibo-gen) ; ⇒ 1
(my-fibo-gen) ; ⇒ 1
(my-fibo-gen) ; ⇒ 2
(my-fibo-gen) ; ⇒ 3
(my-fibo-gen) ; ⇒ 5
(my-fibo-gen) ; ⇒ 8

It can also be OK to use more than one global variables. For example, buffer-face-mode uses two: buffer-face-mode and buffer-face-mode-face.

3. let over lambda

If you enable lexical scope first, then you can use let over lambda like so.

(let ((aa 0)
      (bb 1))
  (defun my-fibo-gen ()
    (cl-psetf aa bb
              bb (+ aa bb))

(my-fibo-gen) ; ⇒ 1
(my-fibo-gen) ; ⇒ 1
(my-fibo-gen) ; ⇒ 2
(my-fibo-gen) ; ⇒ 3
(my-fibo-gen) ; ⇒ 5
(my-fibo-gen) ; ⇒ 8

4. optional reading

4.1. symbol properties

Alternatively, some people use symbol properties (of the symbol for the function name) instead. One such example is the function cycle-font in in this link. If you are worried about possible name clash from using simple names for symbol properties, you can put some prefix to the names.

Posted in Emacs, Lisp | Leave a comment

immutable objects and object identity

The goal of this post is to explain the following rule:

“A user of a programming language is not supposed to think about object identity of an immutable object (in most cases)”

To do that, I should start by explaining the words “objects” and “object identity”.

1. Meaning of object in Python, Lisp, JavaScript

One day, an Elisp beginner or a Python beginner starts to wonder: “how does Python (resp. Lisp) functions pass things around? is it pass by value?”. After some googling, he/she soon starts asking other questions: what is an object? What is a value? Are strings objects? Are numbers objects? Are integers values? Answers to these questions depend on what one means by the words “object” and “value”.

You may be reading a book on Common Lisp or Emacs Lisp and encounter the following sentence: “numbers are objects and they …”. You may be reading a book on JavaScript and encounter this sentence: “Numbers are not objects. Instead, they are …” You might then be temped to imagine some fundamental difference in how Lisp and JavaScript treat numbers, but actually it might be that authors of the two books are using different definition of “object”.

Let’s go through how the specs for Common Lisp, JavaScript, Python use the words “object” (and “value”) in different ways.

1.1. in JavaScript

In the JavaScript spec, one does not call 123 an object, but you can call it a primitive value. What can you call an object? Anything that is a member of the type Object.

aa = {b: 2, c: 3}; // this is an object.
console.log(aa instanceof Object); // -> true

aa = 123; // the spec would not call this an object.
console.log(aa instanceof Object); // -> false

123 is a number value. On the other hand, you can create an object that is a wrapper around a number value. Such an object is called a number object.

aa = new Number(123); // this is a wrapper around a number value
console.log(123 === aa.valueOf()); // -> true
console.log(123 === aa); // -> false
console.log(aa instanceof Object); // -> true
console.log(typeof aa == typeof ({a:2, b:3})); // -> true

aa = Number(123); // just to see what happens when I forget to write 'new'.
console.log(typeof 123 === typeof aa); // -> true

To check out JavaScript definitions of the words “primitive value”, “object”, “number value”, “number object”, see ECMAScript Language specification.

1.2. in Python

In Python Language Reference, even strings and numbers are called objects. Everything is an object. Strings and numbers are immutable objects.

Also, Python Language Reference uses the word “value” in some special way. To explain that, let’s see some Python snippet:

aa = [2, "yeah"]
bb = aa
bb[0] = 4
print aa, bb # -> [4, 'yeah'] [4, 'yeah']
bb = [7, 7]
print aa, bb # -> [4, 'yeah'] [7, 7]

Execution of the first line of the code above makes the name aa mean or refer to an object that is a Python list. The second line makes the name bb refer to that same object. The third line mutates the object. One says that the third line changes the value of the object. So “value” is a word one uses to mean the state of a (mutable) object, in Python Language Reference.

The 5th line, bb = [7,7], makes the name bb refer to another object.

To verify the ways the words “object” and “value” are used in Python Language Reference, see Python data model.

1.3. In Common Lisp

As with Python, one calls everything an object. Numbers are immutable objects. Strings are mutable objects this time.

The spec provides the following example sentence: “The function cons creates an object which refers to two other objects“.

(setq aa (cons 2 "yeah"))
(print aa) ;; -> (2 . "yeah")

Evaluation (i.e. execution) of the first line makes the name aa refer to an object that is returned by the cons function call. The function created a container object that refers to two objects (one of them is a number and the other is a string) and the name aa now refers to that container object simply because that’s what the function returned.

123 is an object. In particular, it is a number. So one can say that 123 is a numeric kind of object, or simply, a number object. Contrast this with how the phrase “number object” is used in the JavaScript spec. Same contrast with another phase “string object”.

What is the meaning of “value” in Lisp speak?

;; sum of 1 and 1
(+ 1 1) ;; -> 2

One says that the value of the expression (+ 1 1) is 2. Also, one can say that the value of the expression (cons 2 "yeah") is (2 . "yeah"). One can also say that the value of aa is (2 . "yeah"). This is how the word “value” is used.

reference: CLHS glossary

Just in case you are curious about what’s the equivalent Lisp code for the Python code example:

(setq aa (cons 2 "yeah"))
(setq bb aa)
(setf (elt bb 0) 4)
(print (list aa bb)) ; -> ((4 . "yeah") (4 . "yeah"))
(setq bb (cons 7 7))
(print (list aa bb)) ; -> ((4 . "yeah") (7 . 7))

1.4. in Emacs Lisp

Meanings of “object” and “value” in Emacs Lisp are essentially the same as in Common Lisp. One can check this by following the entries “object” and “value of expression” from elisp reference index.

1.5. value

Perhaps don’t be too concerned about sticking with just the one spec-approved formal definition of “value” when you are writing an article about for example Python. If readers can clearly see what you meant by “value” in a sentence you wrote in an article, and if there is no way for readers to misinterpret the sentence, then all is well.

1.6. JS objects

On the other hand, let’s not call JavaScript strings immutable objects. They are immutable, but don’t call them objects. Stick with the spec on this. When a JavaScript book is saying that a JavaScript array is an object, it is saying that an array is really a member of type Object and that has some surprising consequences, but that’s another story.

2. immutable objects and object identity

2.1. what is an object identity?

In the following Python code and Lisp code

aa = [None, None]
bb = [None, None]
print aa == bb # -> True
print aa is bb # -> False
cc = bb
print cc is bb # -> True

(setq aa (cons nil nil))
(setq bb (cons nil nil))
(print (equal aa bb)) ;; true
(print (eq aa bb)) ;; false
(setq cc bb)
(print (eq cc bb)) ;; true

Here, names aa and bb refer to two distinct objects, i.e., you could say that the two objects (but with same contents of course) have different addresses in memory. On the other hand, names cc and bb refer to same object, i.e., they refer to just one object. You can actually verify these by mutating bb and then printing aa and cc. The is operator in Python and the eq function in Lisp also tell you such, and actually that’s the whole point of is and eq. One says that aa and bb have different (object) identities and bb and cc have same identity. Clark Kent is Superman. Superman is Clark Kent. Clark Kent and Superman have the same identity. The name “Superman” refers to the same person the name “Clark Kent” refers to. The name cc refers to the same object the name bb refers to.

There are many situations where you have to keep in mind of object identity when it comes to mutable objects. When Clark Kent changes his nationality to China, Superman becomes a citizen of China, but Clone Kent (a clone of Clark Kent created by a scientist some time ago) remains American. Likewise, when you mutate bb (for example, change its first element), cc changes, but aa doesn’t.

2.2. object identity and immutable objects

On the other hand, in everyday coding, there is no need to think about object identity of anything immutable. Immutable objects don’t change, so no need. For example, can you imagine a situation where a programmer has to use the is operator or the eq function on immutable objects such as numbers (rather than using the usual equality operator like the == in Python or the equal in Lisp)? There is no such situation. Some implementations exploit this fact to improve performance by doing some things to immutable objects: they may sometimes make copies of an immutable object (i.e. creating more immutable objects with same state) and use them instead behind the scenes when your code didn’t say to copy it, and sometimes they may reuse the same immutable object over and over behind the scenes when your code didn’t say to reuse it. (The latter is related to something called interning.) This means that if you actually try to use is or eq over immutable objects, you may see some surprising behavior:


aa = 123
bb = 120 + 3
print aa is bb # -> True

aa = 100000000000 + 1
bb = 100000000000 + 1
print aa is bb # -> False

# it may work differently for you

Common Lisp

(let ((aa 123)
      (bb (+ 120 3)))
  (eq aa bb))
;; That can be true or false depending on implementation.

(let ((x 5))
  (eq x x))
;; Even that can be true or false depending on implementation.

There is no point in trying to make sense of outputs of above code. A user of a language is just not supposed to gaze into identities of immutable objects. Some might even say that not having to gaze into such is the entire point of having immutable objects.

3. optional reading

3.1. in terms of pointers/references

For those who know C or similar language.

This section is to repeat the previous section’s point in a different way. Let’s start with a question. Let’s say Bob asks you this: Is Python or Lisp immutable objects pass-by-value or pass-by-reference? Are they references?

Bob’s question raises some interesting thought experiment. But first, you know how in C, the variables are usually visualized as boxes.

int num1;
int num2;
int *ip;

num1 = 20;
num2 = num1;
ip = &num1;

The C int variable num1 is like a box of certain size just big enough to store a C integer in it. The variable num2 is another box also able to store a C integer. We store 20 into the first box. Then we copy from the num1 box and paste into the num2 box. The pointer variable ip is also like a box of certain size, this time just big enough to store a memory address. We store the address of the num1 box into the ip box.

If one were to come up with a simple way to implement Python or Lisp variables in terms of C variables, it would be like this phrase: “pass by value, but values are object references.” In other words, values that go into the boxes (on the stack) should be references to objects that are on the heap. Put in another way, values (that are put into the boxes) are pointers and they point to things on the heap. As a result, addresses (references) are what gets passed around by the assignment operator (and hence also by function argument passing). This implementation can replicate the behavior you would expect for Python and Lisp variables at least when it comes to mutable objects. Let’s call this Implementation 1.

As for immutable objects, you could deviate from Implementation 1, and decide to just store them right there in the boxes, rather then storing their addresses. Let’s call this Implementation 2. (You can’t really push a very large bignum into a box of fixed size, but let’s just assume we can forget that technicality for sake of simplicity of discussion.)

In Implementation 1, mutable objects and immutable objects are passed around by references. In Implementation 2, mutable objects are by references while immutable objects are passed around by values (in the sense of values put into the boxes).

Here is a thought experiment. Alice collects some code snippets and then run them using each of the two implementations. Will Alice get different results depending on implementation?

Each of the two implementations has a conceptual simplicity to it. The only difference a user of the implementation get to see is in how identities of immutable objects are handled, but then the user is not supposed to test object identity of immutable things anyway, so both implementations are acceptable. They are practically interchangeable. A real implementation is likely to be a result of mixing Implementation 1 and Implementation 2 together in a certain way that maximizes performance. In fact, Python code example in the previous section shows that Python on my system is indeed a mix.

Posted in JavaScript, Lisp, Python | Tagged | Leave a comment

Using Emacs with Windows 8 touch keyboard

At the end of this post is some code I use to enter Ctrl, Meta, Shift keys or combinations when I am using Emacs with Windows 8 touch keyboard.

As for why anyone would want to do that, I have a Windows tablet (hybrid laptop) and I thought I wanted to edit org-mode files on my tablet even when I left its physical keyboard at home. By touch keyboard, I mean this: Windows 8 touch keyboard. Windows 8 provides at least three kinds of touch keyboards: default one, thumb keyboard, and lastly a full keyboard which has Alt, Esc, Fn, and so on. The latter is disabled by default, and you can enable it and then you can use it, but it is not optimized for typing sentences like the default one is, so I wanted to have some way to enter modifier keys while using the default keyboard without switching to the full keyboard.

1. some preliminary

With English US touch keyboard (default keyboard), you can type letters like ö or œ easily and quickly (so these are the letters I will map to Meta, Ctrl, etc). For example, to type ö quickly, do the following steps quickly:

  • press o (with your finger)
  • (while your finger is still touching the screen) move the finger toward the up-right direction.
  • release

To see what other letters can be typed with this kind of action, do above steps slowly.

2. some observations

Windows 8 touch keyboard provides word correction feature but it does not seem to work on current version of Emacs. Probably related is that also you cannot use Windows Speech Recognition to enter text into Emacs via voice input (but then Tavid Rudd figured out a way to use Emacs by voice some other way).

3. the code

With the following code, œ corresponds to Meta, which means that when you type œx, Emacs recognizes it as M-x. ō corresponds to Control Meta, õ corresponds to Control, ó corresponds to Control Shift, ö corresponds to Shift, ò corresponds to Meta Shift. Also, ĝ and ĥ correspond to up and down. Additionally for convenience, ê, ē, é correspond to M-x, C-x, C-c, respectively.

Some downside: For obvious reasons, European users should not use this code.

This code builds upon some code written by Al Petrofsky which he wrote to do some other thing.

(defun my-read-function-mapped-event ()
  "Read an event or function key.
Like `read-event', but input is first translated according to
`function-key-map' and `key-translation-map', so that a function key
event may be composed."
  (let ((event (read-event)))
    (if (consp event)
        ;; Don't touch mouse events.
      ;; Otherwise, block out the maps that are used after
      ;; key-translation-map, and call read-key-sequence.
      (push event unread-command-events)
      (let ((overriding-local-map (make-sparse-keymap))
            (global (current-global-map)))
            (progn (use-global-map (make-sparse-keymap))
                   (let ((vec (read-key-sequence-vector nil)))
                     (if (> (length vec) 1)
                         (setq unread-command-events
                               (cdr (append vec unread-command-events))))
                     (aref vec 0)))
          (use-global-map global))))))

;; These functions -- which are not commands -- each add one modifier
;; to the following event.

(defun my-event-apply-alt-modifier (_ignore-prompt)
  "Add the Alt modifier to the following event.
For example, type \\[my-event-apply-alt-modifier] & to enter Alt-&."
  `[,(my-event-apply-modifier (my-read-function-mapped-event) 'alt)])
(defun my-event-apply-super-modifier (_ignore-prompt)
  "Add the Super modifier to the following event.
For example, type \\[my-event-apply-super-modifier] & to enter Super-&."
  `[,(my-event-apply-modifier (my-read-function-mapped-event) 'super)])
(defun my-event-apply-hyper-modifier (_ignore-prompt)
  "Add the Hyper modifier to the following event.
For example, type \\[my-event-apply-hyper-modifier] & to enter Hyper-&."
  `[,(my-event-apply-modifier (my-read-function-mapped-event) 'hyper)])
(defun my-event-apply-shift-modifier (_ignore-prompt)
  "Add the Shift modifier to the following event.
For example, type \\[my-event-apply-shift-modifier] & to enter Shift-&."
  `[,(my-event-apply-modifier (my-read-function-mapped-event) 'shift)])
(defun my-event-apply-control-modifier (_ignore-prompt)
  "Add the Control modifier to the following event.
For example, type \\[my-event-apply-control-modifier] & to enter Control-&."
  `[,(my-event-apply-modifier (my-read-function-mapped-event) 'control)])
(defun my-event-apply-meta-modifier (_ignore-prompt)
  "Add the Meta modifier to the following event.
For example, type \\[my-event-apply-meta-modifier] & to enter Meta-&."
  `[,(my-event-apply-modifier (my-read-function-mapped-event) 'meta)])

(defun my-event-apply-control-meta-modifier (_ignore-prompt)
  `[,(my-event-apply-modifier (my-event-apply-modifier (my-read-function-mapped-event) 'control) 'meta)])
(defun my-event-apply-control-shift-modifier (_ignore-prompt)
  `[,(my-event-apply-modifier (my-event-apply-modifier (my-read-function-mapped-event) 'control) 'shift)])
(defun my-event-apply-meta-shift-modifier (_ignore-prompt)
  `[,(my-event-apply-modifier (my-event-apply-modifier (my-read-function-mapped-event) 'meta) 'shift)])

(defun my-event-apply-modifier (event modifier)
  "Apply a modifier flag to event EVENT.
MODIFIER is the name of the modifier, as a symbol."
  (let ((modified (event-convert-list `(,modifier
                                        ,@(delq 'click (event-modifiers event))
                                        ,(event-basic-type event)))))
    (if (consp event)
        (cons modified (cdr event))

(require 'cl-lib)
(cl-loop for (ks def ok) in (list
                             (list "œ" 'my-event-apply-meta-modifier t)
                             (list "ō" 'my-event-apply-control-meta-modifier t)
                             (list "õ" 'my-event-apply-control-modifier t)
                             (list "ó" 'my-event-apply-control-shift-modifier t)
                             (list "ö" 'my-event-apply-shift-modifier t)
                             (list "ò" 'my-event-apply-meta-shift-modifier t)

                             ;; for up and down
                             (list "ĝ" (kbd "<up>") t)
                             (list "ĥ" (kbd "<down>") t)

                             ;; for quicker access to M-x, C-x, C-c
                             (list "ê" (kbd "M-x") t)
                             (list "ē" (kbd "C-x") t)
                             (list "é" (kbd "C-c") t))
         for key = (kbd ks)
         for bound = (key-binding key)
         do (progn (and bound
                        (not ok)
                        (warn "key %s is already bound to %s" ks bound))
                   (define-key key-translation-map key def)))
Posted in Emacs | Tagged , , | Leave a comment

using org-mode export feature to publish to wordpress with math equations

If you are using the latest official version of Emacs and the latest version of org-mode, which is not the same thing as the version of org-mode that is shipped with Emacs, then here is some way to publish a blog post to wordpress with math equations in it, using org-mode.

1. some preliminary

In an org-mode buffer, one uses dollar signs (the same way as in LaTeX documents) to surround mathematical expressions, but without extra spaces. That means:

$a^2 + b^2$ is good
but $ a^2 + b^2 $ is not good.

When you run the command org-export-dispatch on an org-mode buffer, you can choose to export the contents to HTML. When you do that, something like

$a^2 + b^2$

is exported to something like

\(a^2 + b^2\)

When you paste the exported HTML text into the textbox of the WordPress post editor and press Preview, you will see that the generated preview does not have math equation images.

2. search and replace

The WordPress manual mentions that writing something like

$.latex a^2 + b^2$

(without the dot) is the way to have WordPress generate equations images.

So you just have to search for all occurrences of something like

\(a^2 + b^2\)

in the exported HTML text, and then replace them appropriately before pasting the contents to the WordPress post editor.

Emacs certainly has “search and replace” feature and you can even save such a task as a keyboard macro, but let me give you a command for the task:

(defun my-parens-to-wordpress-math ()
  "Replace \\(...\\) with ... in current buffer."
      (goto-char (point-min))
      (let ((case-fold-search nil)
            (re (rx "\\("
                    (group (+? not-newline))
        (while (re-search-forward re nil t)
          (replace-match (concat "$" "latex \\1$") t))))))

Wrote that command to publish my first math-heavy article on this blog: article on information entropy

3. some idea on how to write a math blog post using org-mode, AUCTeX, latex-preview

My workflow for now is like this:

  1. Write an outline (looking like a TOC) for a blog post in an org-mode buffer
  2. Instead of continuing in an org-mode buffer, I switch to an AUCTeX mode buffer, and write one section there (with the visual help of preview-latex), and then paste it back into a section in the original org-mode buffer.
  3. Go on with the next section.
  4. and so on

AUCTeX does have an outline feature and org-mode does have LaTeX preview feature, but I am more at home with the outline feature of org-mode and the preview feature of AUCTeX for now.

Posted in Emacs, Mathematics | Tagged , , | Leave a comment

Shannon’s entropy of random variables and partitions

This post has two goals:

  • use a hypothetical town with curious residents to give some intuitive understanding of the notion of entropy (of finite partitions or of discrete random variables).
  • give some examples of using well known intuitive properties of entropy to prove some less trivial facts about entropy rather than directly using the definition.

Also, while this post is not about ergodic theory and ergodic theory (and measure theory) is not a requirement for most of this post, I include a section for helping students of ergodic theory see how probability intuitions can help guide them.

This is not about entropy in thermodynamics, but about entropy in information theory. (but the two are related).

Notion of the metric entropy of a dynamical system and differential entropy of a continuous probability distribution is also briefly discussed.

1. entropy for dummies

1.1. definition of entropy of r.v. and joint entropy

You probably heard that the entropy of a discrete random variable X with n distinct outcomes \{x_1, \cdots, x_n\} is defined as

H(X) = - \sum_{i=1}^n p_i \log p_i

where p_i is the probability of the event X = x_i. The RHS is sometimes denoted as H_n(p_1, \cdots, p_n). You might be wondering, how can we make sense of that entropy formula intuitively? You might have heard that it quantifies the amount of uncertainty for outcome of X and might be wondering why quantify in this specific way among other possible ways.

Also, given two discrete random variables X and Y, the joint entropy of X and Y together is defined similarly as

H(X, Y) = - \sum_{i=1}^n p_i \log p_i

where p_i is the probability of the event (X, Y) = z_i and provided that \{z_1, \cdots, z_n\} is the range of values that (X, Y) can take, i.e., \{z_1, \cdots, z_n\} is simply the Cartesian product of the range of X and the range of Y. In other words, the joint entropy of X and Y is just the entropy of the (joint) random variable (X, Y). The joint entropy of more than two random variables is defined similarly.

1.2. the town

Now imagine a town where very curious residents live. Each month, the mayor of this town throws ten fair coins to generate a random binary sequence of length 10. So we have 10 binary random variables: X_1, \ldots, X_{10}. Value of X_i is 1 if the i‘th coin landed on heads, and 0 otherwise. We also have other interesting random variables like X_1 + X_2 or (X_2, X_3) or X_1 X_2 X_3, etc.

Each month, the mayor generates a random sequence of length 10 and then reveals the outcome to Bob who is an employee of the town’s local government. Alice, another employee, sells various tickets for accessing total or partial information about the sequence. Here is how these tickets are supposed to be used: if you are a resident of this town and you want to know the value of X_1 + X_2 this month (because you are curios), you can buy from Alice a ticket that has the following statement written on it: “If you give this ticket to Bob, Bob is required to tell you the value of X_1 + X_2.”. Then you visit Bob’s office and give him the ticket and you get the information in return. Now what is a good price for this ticket? How much would you pay to know the value of X_1 + X_2?

Suppose the price of a ticket for X_1 is one dollar, i.e., you can pay one dollar (to Alice or others) to get a ticket that has the following statement on it: “If you give this ticket to Bob, Bob is required to tell you the value of X_1”. It’s pretty obvious that the price of a ticket for X_2 should be one dollar as well. It is also obvious that the price of a ticket for knowing (X_1, \ldots, X_{5}) should be five dollars.

Now how about this random variable. We define the random variable Z to have value 5 if X_1 = 0, and 6 if (X_1, X_2) = (1,0), and 7 if (X_1, X_2) = (1,1). So Z has probability 1/2 of being 5, and probability 1/4 of being 6, and 1/4 of being 7. What is the right price of a ticket for Z? The right price is 1+\frac{1}{2} dollars. That is because there’s this alternative way of knowing Z: you can first buy a ticket for X_1 and then you give the ticket to Bob to be informed of the value of X_1, and if you learn that X_1 is 1, you then buy a ticket for X_2 and visit Bob again. This alternative way of knowing Z costs 1 + \frac{1}{2} dollars on average. Now you might say “you convinced me that the prize for a ticket for Z should be \le 1 + \frac{1}{2}. Now convince me that it should be \ge 1 + \frac{1}{2}”. I have to say that right now I don’t have any satisfying argument to convince you that, except to say that the alternative way of knowing Z seems efficient and it feels like the way cannot be further improved to reduce average cost.

What about the cost of a ticket for (X_1 + X_2, X_2 + X_3). If you buy a ticket for X_1 + X_2, which we will assume to cost a dollars, and another ticket for X_2 + X_3 (also costing a dollars), then you get to learn the value of (X_1 + X_2, X_2 + X_3) after visiting Bob. Therefore the price of a ticket for (X_1 + X_2, X_2 + X_3) should be \le 2a (otherwise, you could create a business where each month you buy one ticket for X_1 + X_2 and one ticket for X_2 + X_3 from others and spend the two tickets at Bob to learn the value of (X_1 + X_2, X_2 + X_3) and then sell that information to others with the price of a ticket for (X_1 + X_2, X_2 + X_3), and profit. Let’s assume that the residents of the town are law abiding citizens and there is a law that bans selling any information (about the binary sequence) twice. Maybe residents are required to erase from their memory any information they sell to or hand over to others. Maybe there are other patch ups needed but let’s stop for now.)

1.3. intuition for entropy and its properties

For a (discrete) random variable X, think of H(X) as proportional to the ideal price for a ticket for X. The conversion factor is \log 2 per dollar. We normalize things so that the entropy of a fair coin is \log 2 (i.e. one dollar). I can’t go into a proof of this, but it is known that there is only one formula for H_n(p_1, \ldots, p_n) that guarantees the following properties:

(decomposition property) H_n(p_1, \ldots, p_n) + p_1 H_m(q_1, \ldots, q_m) = H_{n+m-1}(p_1 q_1, \ldots, p_1 q_m, \, p_2, \ldots, p_n)

(triangle inequality for entropy) H(X, Y) \le H(X) + H(Y) for any two discrete random variables X, Y

(positivity) H(X) \ge 0

(normalization) H_2(\frac{1}{2}, \frac{1}{2}) = \log 2

(continuity) each H_n is a continuous function

Remember the ticket which cost 1+ \frac12 dollars? The argument for its price gives an intuitive meaning to the decomposition property. Remember the ticket that cost \le 2a dollars? That gives some intuitive sense for the triangle inequality for entropy. The entropy formula also guarantees the following additional properties:

(entropy of equidistribution) H_n(\frac1n, \ldots, \frac1n) = \log n

(upper bound) If X can have n outcomes, then H(X) \le \log n with equality if and only if X is equidistributed.

(w.r.t. independence) H(X, Y) = H(X) + H(Y) if and only if X and Y are independent.

(monotonicity) H(X) \ge H(Y) if value of X determines value of Y, i.e., if Y is a function of X. (Note that this is same as saying H(X,Y) \ge H(X) for all discrete random variables X and Y)

Having listed all these properties, some understanding emerges: H(X) is best thought of as the additive amount of uncertainty in X.

1.4. intuition for the notion of surprisal

Another (and perhaps more direct) way to have some intuitive sense of the entropy formula for H(X) is to think of it as the average amount of information you gain upon hearing the value of X. For that, we need to quantify the amount of information you gain for hearing, for example, that the value of X turned out to be 1, i.e., that the event X=1 occurred.

If A are events, the surprisal of A is defined as I(A) = - \log P(A) where P(A) is the probability of the event A. This quantifies the amount of information (and the amount of surprise) for knowing that A occurred. This quantity has a nice additive property, but to see that, we need to define conditional surprisal first. If A, B are events, the conditional surprise of A given B is defined as I(A | B) = - \log P(A | B) where P(A | B) is the conditional probability of A given B. This quantifies the amount of information someone already knowing B happened gains upon hearing that A too happened.

The additive property is this: at first, you knew nothing about the value of the binary sequence (hence, you had zero amount of information as to what the value of the random binary sequence was), and then you heard that some event B occurred. For example, the event B could be something like X_1 + X_2 = 1. By hearing that news, you gained I(B) bits of information (about the sequence). And then you heard another news A (another event). Upon hearing this news, you gained I(A|B) bits of information. In total you must have gained I(A \cap B) bits of information. Does the following equality actually hold?

I(A \cap B) = I(B) + I(A|B)

Yes, it does. That in turn justifies the formula for surprisal.

Back to the definition of H(X). It is easy to work out that the entropy formula defines H(X) to be the average amount of information you gain upon hearing the value of X.

2. intuition for conditional entropy and its definition and properties

For two (discrete) random variables X and Y, the conditional entropy of X given Y is denoted by H(X|Y) and is defined to be the average conditional surprisal for hearing the value of X when one already knows the value of Y. This quantifies the amount of uncertainty that someone knowing the value of Y has about X. Conditional entropy satisfies following properties:

(positivity) H(X|Y) \ge 0, with equality if and only if X is a function of Y

(upper bound from unconditioning) H(X|Y) \le H(X), with equality if and only if X and Y are independent (you can also think of this inequality as giving a lower bound for H(X))

(chain rule) H(X, Y) = H(X) + H(Y|X)

Intuition for chain rule is that learning the value of (X,Y) can be done in two steps: learn the value of X, and then learn the value of Y. Compare this rule with the additive property of surprisal.

The chain rule can be thought of as a way to express conditional entropy in terms of unconditional entropy. There is another way to do that: H(Y|X) = \sum P(X=x_i) H(Y| X=x_i) where the sum runs over the range of X and H(Y | X=x_i) (sometimes called specific conditional entropy) is the entropy for conditional probability distribution of Y given the event X=x_i. With this in mind, now note that the chain rule can be thought of as an encapsulation of the repeated application of the decomposition property of entropy.

Other properties:

(chain rule 2) H(X, Y | Z) = H(X | Z) + H(Y|X, Z) (where H(Y|X, Z) is defined to be H(Y|(X, Z)))

(monotonicity) H(X, W | Y) \ge H(X | Y, Z) (Try to work out equality conditions for H(X, W | Y) \ge H(X | Y) and H(X | Y) \ge H(X | Y, Z) and you will get two more interesting properties.)

3. entropy of continuous probability distribution

Visit Wikipedia article on differential entropy to see its definition. Here is something that may hook your interest in differential entropy: many probability distributions you may know such as uniform distribution, normal distribution, exponential distribution, geometric distribution, can be characterized as distributions that maximize entropy under certain constraints. For the math-savvy, “Probability distributions and maximum entropy” by Keith Conrad is a good start. As for why and how maximum entropy distributions tend to show up in nature (for example, Maxwell–Boltzmann distribution), perhaps statistical mechanics textbooks have some answers, but not my area of expertise.

3.1. some intuition for differential entropy

Its translation invariance embodies the intuition that knowing that some value is between 1.0 and 1.1 is same amount of information as knowing some value is between 10.0 and 10.1.

The way differential entropy responds to scaling of continuous random variables embodies the intuition that knowing that some value is between 0 and 0.1 is one bit more information than knowing that some value is between 0 and 0.2.

4. for students of ergodic theory

4.1. entropy of a partition

If you are learning ergodic theory, you will at point or have learned that when \alpha is a (measurable, finite) partition (of a probability space (\Omega, \mathcal F, \mu)), its entropy H(\alpha) is defined to be H_k(\mu(A_1), \ldots, \mu(A_k)) where A_i are elements of \alpha. If we use \hat \alpha to denote the random variable defined as \hat \alpha(\omega) = i iff \omega \in A_i, then we have H(\alpha) = H(\hat \alpha). A trivial equality but it really helps to think of the entropy of a partition as the entropy of its corresponding random variable so that you are encouraged to use probabilistic intuitions when you are, for example, solving some relevant exercises from your ergodic theory textbook.

Side note: Another good way of defining \hat \alpha is to define \hat \alpha(\omega) to be the element of \alpha containing \omega. This way has the benefit of not having to specify an ordering of elements of \alpha. Another benefit is you feel more confident in using the useful abuse of notation that is writing \alpha(\omega) in place of \hat\alpha(\omega).

Further observations on correspondence between partitions and random variables: If \alpha, \beta are two partitions, \alpha \vee \beta corresponds to the random variable (\hat\alpha, \hat\beta). The partition \alpha is finer than \beta if and only if \hat\beta is a function of \hat\alpha. Given a map T : \Omega \to \Omega and \alpha = \{A_1, \ldots, A_k\}, the shifted partition \{T^{-1}A_1, \ldots, T^{-1}A_k\} corresponds to the shifted random variable \hat\alpha \circ T. These trivial observations combine together and lead to the following non-trivial insight: The pairing of a dynamical system and a partition \alpha gives rise to the stationary statistical process \hat\alpha, \hat\alpha\circ T, \hat\alpha\circ T^2, \ldots. Conversely, any (discrete-valued) stationary statistical process gives rise to an invariant measure on a shift space in the obvious way.

Entropy of the join of two partitions corresponds to joint entropy of random variables: H(\alpha \vee \beta) = H(\hat\alpha, \hat\beta). Metric entropy of a (measure theoretic) dynamical system w.r.t a partition corresponds to average entropy for its corresponding statistical process.

Moving between the world of dynamical systems and the world of statistical processes can take some getting used to.

4.2. comparison between metric entropy and differential entropy

For those who are learning about metric entropy or differential entropy, let’s list some differences for purpose of notes.

While differential entropy concerns continuous probability distributions (on the n-dimensional space), metric entropy concerns fractal-like probability distributions from statistical processes or invariant measures of a dynamical system. Differential entropy can be negative while metric entropy cannot be. Metric entropy can be described as a conditional entropy, but I am not sure if differential entropy can be too.

As for how they react to forming a convex combination of two probability distributions, the map from continuous probability distributions to their differential entropy is a concave map, but the map from stationary statistical processes or invariant measures (on a system) to their metric entropy is an affine map.

5. using properties of entropy in proofs

5.1. if two random variables are close to each other, then how much are their entropies close to each other?

Let’s say X, Y are two random variables with range \{1, \ldots, k\} and that we have P(X \neq Y) < \delta. Now we want to find a good enough upper bound for | H(X) - H(Y) |, i.e., we want to prove an inequality of the form | H(X) - H(Y) | \le F(\delta, k). We will choose what F(\delta, k) is later, but for now notice that in order to prove such inequality, we only need show H(X) \le H(Y) + F(\delta, k) (E10) (and H(Y) \le H(X) + F(\delta, k)). As I said earlier, we would like to prove this without directly using the definition of entropy.

To prove E10, we only need to build a random variable Z with H(X) \le H(Y, Z) (E20) and H(Z) \le F(\delta, k). To establish E20, it is enough to build Z so that knowing Y and Z together enables one to know X. So we want Z to provide just enough missing information to guess X from Y. A way to build Z now emerges. We define Z as Z = X on the event X \neq Y, and Z = * on the event X = Y where * is any fixed value distinct from the range of X. Easy to see that value of X can be determined from value of Y and Z, hence E20 holds. Now it only remains to give a reasonable upper bound on H(Z) in terms of \delta and k . Let Z' be just a binary random variable corresponding to distinguishing whether X=Y or not. Then H(Z) = H(Z') + H(Z | Z') is less than or equal to H_2(\delta, 1- \delta) + \delta \log k which we now decide to be our F(\delta, k).

5.2. entropy with respect to convex combination

Given two invariant probability measures \mu_1, \mu_2 on \Omega and its convex combination p_1 \mu_1 + p_2 \mu_2 (where p_1 + p_2 = 1, p_1 \ge 0, p_2 \ge 0), we wish to prove the following two facts.

(Fact 1) H_{p_1 \mu_1 + p_2 \mu_2}(\alpha) \ge \sum_i p_i H_{\mu_i}(\alpha), i.e., H is concave with respect to measures.

(Fact 2) (if you know metric entropy) h_{p_1 \mu_1 + p_2 \mu_2}(\alpha) = \sum_i p_i h_{\mu_i}(\alpha), i.e., the metric entropy map is an affine map (with respect to measures).

Above two facts follow easily from this Fact 3: the difference H_{p_1 \mu_1 + p_2 \mu_2}(\alpha) - \sum_i p_i H_{\mu_i}(\alpha) (E30) is in [0, H_2(p_1, p_2)].

We want to prove Fact 3 using only basic properties of entropy. How can we employ probabilistic intuition to guide us to find such a proof? The expression p_1 \mu_1 + p_2 \mu_2 hints at conditioning on some event with probability p_1 to get to \mu_1 and on the complement event (with probability p_2) to get to \mu_2. The probability space where this conditioning happens will have to be bigger than \Omega because it does not make sense to equate a subset of \Omega with the event. The expression E30 involves three terms with three different measures and one shared partition. To rephrase this expression in a probabilistic context, remember that the final expression will have to consist of terms with different random variables (different partitions) living on some shared sample space (one shared measure).

The shared sample space will be a weighted disjoint union of of (\Omega, \mu_1, \alpha) and (\Omega, \mu_2, \alpha) (weights being p_1, p_2). To form the disjoint union, we form a copy (\Omega'_i, \mu_i', \alpha'_i) (for each i=1,2) of (\Omega, \mu_i, \alpha), and then form the union (\bar{\Omega} = \cup \Omega'_i, \bar{\mu} = \sum p_i \mu_i', \bar\alpha) where the partition \bar\alpha is defined to be the one that corresponds to the random variable (also denoted \bar\alpha to abuse notation) defined by \bar\alpha(x) = \alpha'_{\gamma(x)}(x) where \gamma is the partition of \bar \Omega corresponding to the binary random variable assigning i to \Omega'_i and where \alpha'_i as a random variable has the same range as \alpha.

E30 is then equal to H_{\bar\mu}(\bar\alpha) - \sum p_i H_{\mu_i'}(\alpha'_i) which is equal to H_{\bar\mu}(\bar\alpha) - H_{\bar\mu}(\bar\alpha | \gamma) which is in [0, H_{\bar\mu}(\gamma)] which is equal to [0, H_2(p_1, p_2)].

Posted in Mathematics | Tagged , , , , | Leave a comment

How to use Unicode in LaTeX (by LuaTeX or XeTeX)

The goal of this post is to gradually build up minimal examples for making Unicode text work in LaTeX documents by using LuaTeX or XeTeX. In the end, we will have produced PDF files containing CJK text (Chinese, Japanese, Korean or hangul text).

1. before we begin

To make things simple, I will assume your goal is to be able to write LaTeX documents with Japanese or Korean text in them.

Before we begin, there are four things you must make sure:

  1. Make sure that your web browser is able to display Korean text.
  2. Make sure you know or learn a way to type Korean text on your computer.
  3. Make sure your text editor can display Korean text.
  4. Make sure you know how to save your TeX document as a UTF-8 text file.

in that order.

For 1 and 2, googling will help you. For 3 and 4, see the manual for your editor. Whatever you did to make sure of 1 may have a side effect of automatically making sure of 3.

2. LuaLaTeX vs XeLaTeX and what are they?

LuaTeX and XeTeX are alternative TeX engines and both are designed to work with Unicode text. Invoking LuaLaTeX or XeLaTeX just means invoking LuaTeX or XeTeX with LaTeX format. This post contains examples to be tried in both, but if you want to be persuaded to choose just one, see Why choose LuaLaTeX over XeLaTeX.

3. first example

Write the following LaTeX document:


\section{ASCII English}
Hello world.
¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu
          Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა
(Chinese) 你好, 早晨, (Japanese)こんにちは, (Korean, hangul) 안녕하세요


Save it as first-example.tex and make sure it is saved in UTF-8 encoding. Then compile it with XeLaTeX or LuaLaTeX. The document contains many ways of saying hello in different languages which I copied from the HELLO file that is shipped with GNU Emacs.

Compilation should finish fine without errors and it should produce a PDF file. (If you are using TeXworks & MiKTeX on MS Windows, compiling a document with XeLaTeX is as simple as choosing XeLaTeX from the engine list (next to the compile button on the TeXworks window) and then pressing the compile button. Compiling with LuaLaTeX is similar.)

The produced PDF output may have some missing letters. In particular, CJK texts are not displayed in the output.

Now add the following line to the preamble (you know what a preamble in a LaTeX document is):


Then save the document and compile with XeLaTeX or LuaLaTeX again. Compilation should finish fine without errors. Now the produced output will have less missing letters, but CJK texts are still not shown. According to my test, with XeLaTeX, broken letters are displayed as spaces, but with LuaLaTeX, broken letters are just gone.

Now add the following line to the preamble and try again (we are specifying a font now):

\setmainfont{Times New Roman}

The produced output will have even less missing letters. CJK texts still not shown. According to my test (done on MS Windows), with XeLaTeX, broken letters are now displayed as boxes. Times New Roman font does not support CJK text.

Now we change that line to the following line (assuming that you have Batang font on your computer, which will already be the case if you enable Korean language support on MS Windows):


Now the output shows most of CJK characters. While the Chinese hello text is still missing one character, Japanese hello and Korean hello are displayed completely now.

The final document:


\section{ASCII English}
Hello world.
¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu
          Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა
(Chinese) 你好, 早晨, (Japanese)こんにちは, (Korean, hangul) 안녕하세요


We loaded a fontspec package and then used the \setmainfont command (from the fontspec package) to choose Batang as the main font.

4. how to experiment with adjusting font features

Now you know how to include Korean text in LaTeX documents. There are still two problems left to solve, but before I mention the two problems, let’s see how to have localized effect of changing font, for example you can write this:

{\fontspec{Batang} hello world}

and that applies Batang font to just that text of hello world. That means that if you write:

one {\fontspec{Batang} two three} four

“two three” will show in Batang font, while “one” and “four” will show in whatever is the main font that is set by the \setmainfont command in the preamble. Now I can demonstrate the first of the two problems: what would the following do?

{\fontspec{Batang} When he goes---``How are you, Alice'', she replies-``I am fine and you?''}

You might expect to see em-dash and double quotes in the produced output but that does not happen. To make that happen, you can either just use Unicode em-dash and Unicode double quotes or you can add a font feature like this:

{\fontspec[Ligatures=TeX]{Batang} When he goes---``How are you, Alice'', she replies-``I am fine and you?''}

If you want to apply the font feature Ligatures=TeX to the whole document, you can use


instead of


The second problem: what would the following do?

{\fontspec{Batang} 월남쌈 (goi cuon) \emph{월남쌈 (goi cuon)}}

You might expect the emph part to be displayed in some italic font, but that does not happen. The emph part is not even displayed in a different shape, let alone an italic shape. Even the English ASCII portion of the emph part is not in italic. Bold face is not working either:

{\fontspec{Batang} 월남쌈 (goi cuon) \emph{월남쌈 (goi cuon)} \textbf{월남쌈 (goi cuon)}}

How can we solve this problem? Let’s experiment further with \fontspec command. What if we try a different Korean font? How about this:

{\fontspec{Malgun Gothic} 월남쌈 (goi cuon) \emph{월남쌈 (goi cuon)} \textbf{월남쌈 (goi cuon)}}

With Malgun Gothic, now bold face works but emph is still not distinguished. To make emph look distinguished, you can add a font feature like this:

{\fontspec[ItalicFont={Malgun Gothic Bold}]{Malgun Gothic} 월남쌈 (goi cuon) \emph{월남쌈 (goi cuon)} \textbf{월남쌈 (goi cuon)}}

That makes both the emph part and the textbf part display in bold face, but there will be no visual distinction between the emph part and the textbf part. To fix that, one can add a font feature specifying that italic face to be displayed with a different Korean font, for example, Gungsuh font:

{\fontspec[ItalicFont={Gungsuh}]{Malgun Gothic} 월남쌈 (goi cuon) \emph{월남쌈 (goi cuon)} \textbf{월남쌈 (goi cuon)}}

So we now have ideas on what font features to add because of our experiments using the \fontspec command. To add these font features together and apply them to the whole document, you can use

\setmainfont[ItalicFont={Gungsuh},Ligatures=TeX]{Malgun Gothic}

5. in summary

Assuming you are writing a document with mostly Korean text, we found that something like the following template is a good start:

\setmainfont[ItalicFont={Gungsuh},Ligatures=TeX]{Malgun Gothic}


월남쌈 (goi cuon) \emph{월남쌈 (goi cuon)} \textbf{월남쌈 (goi cuon)}

\section{English conversation}
When he goes---``How are you, Alice'', she replies-``I am fine and you?''

이게 바로 내가 찾던 남자 돌침대\\
이게 바로 내가 찾던 남자 돌침대\\
이게 바로 내가 찾던 남자 돌침대\\

\[ a^2 + b^2 = c^2 \]


For Vietnamese food, goi cuon is a good start. You all should try it some day.

6. something about LuaLaTeX

Something I noticed about LuaLaTeX is that compiling the above document with LuaLaTeX takes about twice as much time as compiling with XeLaTeX, and that only XeLaTeX can take font names in Korean, for example, \fontspec{맑은 고딕} will not work with LuaLaTeX. So it is good to know English names of Korean fonts: 맑은 고딕 = Malgun Gothic, 바탕 = Batang, 궁서 = Gungsuh.

Posted in Mathematics | Tagged , , , | Leave a comment