## How to make prettify-symbols-mode work with AUCTeX

prettify-symbols-mode is a recent feature of Emacs and it’s very nice. And it looks like it can replace TeX-fold-mode in the future. But, at the time of writing, prettify-symbols-mode doesn’t seem to work well with AUCTeX unless you enable two workarounds together.

Tested versions:

• GNU Emacs 25.1

## Usual way to enable pretty symbols

If you want to enable it for elisp buffers, you can add:

(add-hook 'emacs-lisp-mode-hook 'prettify-symbols-mode)

Then something like (lambda () (blah)) in elisp buffers should display as (λ () (blah)).

If you want to enable it also for other lisp buffers, scheme mode buffers etc, you can adjust the following code:

(dolist (mode '(scheme emacs-lisp lisp clojure))
(let ((here (intern (concat (symbol-name mode) "-mode-hook"))))
(add-hook here 'prettify-symbols-mode)))

If you want to enable for all buffers, you can add:

(global-prettify-symbols-mode 1)

And then for major modes of your interest, you may want to adjust the buffer-local prettify-symbols-alist accordingly, following the simple example code you can find from the documentation for prettify-symbols-mode.

## Expected way to use with AUCTeX

Following code may be expected to work:

(add-hook 'TeX-mode-hook 'prettify-symbols-mode)

If it works, then \alpha, \beta, \leftarrow and so on should display as α, β, ←, …  for TeX file buffers. I do not doubt that it will just work fine in future versions of AUCTeX, and if you are reading this as an old article, it is possible that just upgrading your AUCTeX package may be enough to make that line work as you expected. If it doesn’t work, then try making the following two changes.

## First change

(defun my-delayed-prettify ()
(run-with-idle-timer 0 nil (lambda () (prettify-symbols-mode 1))))
(add-hook 'TeX-mode-hook 'my-delayed-prettify)

This way, (prettify-symbols-mode 1) is guaranteed to run after the style hooks and not before. I don’t know what style hooks do, but it looks like they may reset/erase font-lock stuff you have set up.

If pretty symbols still don’t show up in AUCTeX buffers, then try adding the following change, in addition to the above change.

## Second change

(require 'tex)

tex.el will load anyway if you visit a TeX file with Emacs. This is a strange change to make, indeed.

You should also remove the following line that is commonly used with miktex users:

(require 'tex-mik)

tex-mik.el is a good small library but tex-mik.el loads tex.el. Feel free to copy parts of tex-mik.el and paste to your dotemacs if you want.

You can ensure you have removed every call of (require ‘tex) from your dotemacs by appending the following line to the end of dotemacs and then restarting Emacs to see if the warning message shows up:

(if (featurep 'tex) (warn "(require 'tex) is still somewhere!"))

If there was some code in dotemacs that relied on the fact that (require ‘tex) was called before, then you have to wrap that code with the with-eval-after-load macro, like this:

(with-eval-after-load 'tex
("SumatraPDF"
my--sumatrapdf)))

## When OneDrive is stuck at syncing

Purely a OneDrive problem, but it may also concern LaTeX users, Emacs users, etc.

## Problem description

OneDrive is a great feature of MS Windows that many people and I love and use, but you might have encountered some issues with syncing. This becomes more visible when you have OneDrive tray icon always appear on the taskbar, which you can have by first going to “Taskbar settings” and then from there entering the link “Select which icons appear on the taskbar”.

Sometimes OneDrive is stuck in the middle of syncing and never finishes syncing. You can spot it from the tray icon indefinitely staying at the “processing changes” or “uploading …” state. Maybe a short way to describe the problem is to say that OneDrive suddenly behaves like Internet access is off, despite the fact that you can browse the Internet fine at that time.

## What tends to cause it

There seems to be certain types of tasks that tend to mess with OneDrive temporarily:

• Compiling some code inside a folder in OneDrive (e.g. from working with a LaTeX document, writing a program)
• editing a text file in OneDrive using an editor that may generate or maintain auxiliary files alongside the text file (e.g. Emacs lock files)

Whether those tasks will actually cause it seems quite random. So it’s understandable that it might take a while for the folks at MS to fix this bug, if this is a bug.

## What to do when it happens

The shortest way I know is: Long press or right click on the OneDrive icon to open up its context menu and select “pause syncing”. Then open the context menu again and select “Resume syncing”. You might need to update Windows 10 if you don’t see the “pause syncing” button.

Another way is to exit OneDrive (again using its context menu) and then restart it.

The syncing will get unstuck, but when you do one of those aforementioned tasks again, the syncing may get stuck again.

There seems to be no permanent solution available yet and we will all have to wait until a fix is released.

## Introduction to Wasserstein metric (earth mover’s distance)

This is a short introduction to the notion of Wasserstein metric but presented in a way that explicitly highlights the earth mover intuition on the definition.

## 1. the cost of moving one particle

Throughout the post, we fix a metric space (X,d). Suppose a particle of mass $w \ge 0$ is placed at position $x \in X$. The cost of moving this particle to a new position $y \in X$ is defined to be $w d(x,y)$. With this definition, you see:

• The cost of moving a particle from place x to the same place x is zero.
• The cost of moving a particle is proportional to its mass. (This makes sense: a particle of mass $2w$ at position x is and should be cost-wise equivalent to two particles each of mass $w$ at the same position x.)
• The cost of moving a particle of unit mass from x to y is $d(x,y)$.

## 2. The cost of moving a finite collection of particles of total mass 1.

Imagine a configuration of three particles of total mass 1 placed in X. For each i = 1,2,3, the particle i has mass $a_i \ge 0$ and is placed at $x_i \in X$. Since the total mass is 1, we require $\sum_{i=1}^3 a_i = 1$. The mass distribution of this configuration may be written as $\mu := \sum_{i=1}^3 a_i \delta_{x_i}$ where $\delta_{x_i}$ denotes the Dirac delta function centered at $x_i$. (Alternatively, you could interprete the sum $\sum_{i=1}^3 a_i \delta_{x_i}$ to mean a formal sum, or use Dirac measures instead.)

Now consider another configuration but this time of four particles of total mass 1, placed in the same space X. In this second configuration, for each $j=1,2,3,4$, there is a particle of mass $b_j \ge 0$ at position $y_j \in X$. The mass distribution of this configuration may be written as $\nu = \sum_{j=1}^4 b_j \delta_{y_j}$.

Given the first configuration (three particles), we can split some or all of the three particles into smaller particles of various smaller mass and move them in a way that results in the second configuration. There is more than one way to do it.

Start with the first configuration. Suppose, for each $i=1,2,3$, we split the particle of mass $a_i$ at position $x_i$ into four smaller particles of mass $w_{i1}, w_{i2}, w_{i3}, w_{i4} \ge 0$. The total number of split particles is now $12 = 3 \cdot 4$ and $a_i = \sum_{j=1}^4 w_{ij}$ for each $i=1,2,3$. Move the split particle of mass $w_{ij}$ from the old position $x_i$ to the new position $y_j$. The total mass of all particles at the new position $y_j$ is $\sum_{i=1}^3 w_{ij}$ and we require this to be equal to $b_j$, so that they merge to become a particle of mass $b_j$ at position $y_j$.

Any $3 \times 4$ matrix $(w_{ij})$ of nonnegative numbers such that $a_i = \sum_{j=1}^4 w_{ij}$ for all i and $b_j = \sum_{i=1}^3 w_{ij}$ for all j specifies a split-move-merge procedure that transforms the first configuration into the second configuration.

The cost of a split-move-merge procedure is defined to be its total cost of moving: moving the particle of mass $w_{ij}$ from $x_i$ to $y_j$ costs $w_{ij} d(x_i, y_j)$ and so the total cost is the sum $\sum_{ij} w_{ij} d(x_i, y_j)$.

We define the cost of transforming the mass distribution $\mu = \sum_{i=1}^3 a_i \delta_{x_i}$ into the second $\nu = \sum_{j=1}^4 b_j \delta_{y_j}$ to be the minimum possible cost for split-move-merge procedures from the first configuration to the second configuration.

It is an easy check that this definition of distance between $\mu$ and $\nu$ does not depend on how you write $\mu$ and $\nu$. (In other words, whether you see a particle at position x of mass $\frac12$ or see two particles at the same position x each of mass $\frac14$ does not affect costs.)

## 3. coupling

Each split-move-merge procedure $(w_{ij})$ defines a distribution $\lambda = \sum_{ij} w_{ij} \delta_{(x_i,y_j)}$ on $X^2$. This distribution $\lambda$ is a coupling of $\mu$ and $\nu$ in the sense that it projects to $\mu$ and $\nu$. Indeed, if you project it to the first coordinate, you would get $\sum_{ij} w_{ij} \delta_{x_i} = \sum_i \delta_{x_i} \sum_j w_{ij} = \sum_i a_i \delta_{x_i}$.

You could think of the couping $\lambda$ as a distribution on the space $X^2$ of arrows, where you think of each $(x,y) \in X^2$ to be an abstract arrow from x to y.

The integration of the function d(x,y) (whose domain is the space $X^2$ of arrows) w.r.t. the distribution $\lambda$ works out to be the same as the cost $\sum_{ij} w_{ij} d(x_i, y_j)$.

Conversely, each coupling $\lambda$ of $\mu$ and $\nu$ gives rise to a split-move-merge procedure that transforms $\mu$ into $\nu$ whose cost is equal to the integration of d(x,y) w.r.t. $\lambda$.

So the cost of transforming $\mu$ to $\nu$ is the same as the minimum possible expected value of the length $d(x,y)$ of arrows (x,y) over all possible couplings of $\mu$ and $\nu$.

## 4. composing

The distance between $\mu$ and $\nu$ is a metric. The triangle inequality on $d(\cdot,\cdot)$ transfers to the triangle inequality for distance between $\mu, \nu$ because you can compose a transformation from $\mu$ to $\nu$ and another from $\nu$ to $\rho$ to get a third transformation from $\mu$ to $\nu$ whose cost is at most the sum of the costs of the two former transformations. This property is more easily seen if you think of $\mu$ as piles of dirt (of unit mass). In fact, suppose $\mu$ is three piles of dirt: pile i of mass $a_i$ at position $x_i$. Suppose pile 1 is red dirt, pile 2 is yellow dirt, and pile 3 is green dirt. Move this three piles of dirt to form $\nu$ using some transformation $(w_{ij})$. Each of the resulting piles may be a mixture of different colors. Now use some other transformation $(w'_{ij})$ to move the piles again to form $\rho$.The net effect is a transformation of $\mu$ into $\rho$ and the corresponding composition $(w''_{ij})$ can be obtained by examining percentages of different colors of dirt in each of the final piles of dirt. (the output $(w''_{ij})$ is actually not unique here, but there is an obvious canonical choice.)

## 5. definition of the Wasserstein distance

What we have defined is known as the 1st Wasserstein distance between $\mu$ and $\nu$ and is usually denoted $W_1(\mu,\nu)$. For general non-discrete probability measures $\mu$ and $\nu$ on $X$, the definition is done using couplings of $\mu$ and $\nu$ but you will have to take inf instead of min, and the distance can be infinite.

This distance can be thought of as a probabilistic analogue of the Hausdorff distance between two sets..

## 6. Moving a collection of particles of equal mass

Suppose we are to move a collection of n particles whose each mass is $\frac1n$. In the first configuration, the n particles are at the n places $x_1, \cdots, x_n$ (some of which may be same places). In the second configuration, the new n places are $y_1, \cdots, y_n$. It is known that in this case there is always a split-move-merge procedure (between the two configurations) that minimizes the cost but also skips the split and the merge steps. The procedures without the split and merge steps are specified by functions $\pi$ from $\{1,\cdots,n\}$ to itself. If $\pi$ is such a function, then this specifies a procedure in which you move the old particle of mass $\frac1n$ at the old position $x_i$ to the new position $y_{\pi(i)}$ (at the cost of $\frac1n d(x_i, y_{\pi(i)})$) for $i=1,2,\cdots,n$.

So the cost of transforming $\mu = \frac1n \sum_{i=1}^n \delta_{x_i}$ to $\nu = \frac1n \sum_{j=1}^n \delta_{y_j}$ in this case is the same as the minimum of the average $\frac1n \sum_{i=1}^n d(x_i, y_{\pi(i)})$ over all permutations $\pi$ on $\{1,\cdots, n\}$.

The fact that split and merge steps can be discarded is known as Birkhoff-von Neumann theorem, which actually states that any $n\times n$ doubly stochastic matrix (i.e. a split-move-merge procedure in our case) can be written as a convex combination of permutations.

It is known that, in a metric space that is nice enough, any probability measure $\mu$ on it can be approximated by discrete distributions of the form $\frac1n \sum_{i=1}^n \delta_{x_i}$ in an appropriate sense.

## 7. Some properties

(some lower bounds) If two probability measures $\mu$ and $\nu$ are supported on subsets $A, B$ respectively and if $d(A,B) \ge c > 0$, then the distance between $\mu$ and $\nu$ is also $\ge c$. If we relax to $\mu(A) = 1 - O(\epsilon)$ and $\nu(B) = 1 - O(\epsilon)$, then we still have the approximate $W_1(\mu,\nu) \ge c (1 - O(\epsilon))$.

(upper bounds) If $\nu$ is the pushforward image of $\mu$ under some $f: X \to X$ and if $f$ is such that $d(x,f(x)) \le \epsilon$ for all x, then $W_1(\mu,\nu) \le \epsilon$. If we relax to just $d(x, f(x)) = O(\epsilon)$ for all $x \in A$ (uniformly) in some big set $\mu(A) = 1 - O(\epsilon)$ and the diameter of X is O(1), then we still obtain an approximate conclusion $W_1(\mu,\nu) = O(\epsilon)$.

## How to close the sudden Camera app from Windows 8

The readers are assumed to be familiar with names of basic Windows 8 touch gestures (for which, see http://windows.microsoft.com/en-us/windows-8/touch-swipe-tap-beyond )

## 1. introduction

With some Windows 8 tablets (or laptops), you may come across a sudden appearance of the Camera app or the sudden activation of the webcam when you are trying to turn on your laptop (or maybe more precisely, when you are trying to sign in to your PC from the lock screen). Since the screen makes it clear that the webcam is on, you know you have unintentionally launched some Camera app. Surprisingly, the usual ways of closing or exiting the app or switching to other apps does not work. How should one get out of this and get back to the usual lock screen (the log in screen)?

## 2. how to get out

It is actually quite simple to turn off or deactivate this thing. You simply tap or click Unlock which should be found on the left side of the app commands at the bottom of the screen and you are back to the lock screen.

## 3. how did I end up there

You usually swipe up on the lock screen to sign in to your PC, but when you swipe down on it instead, that launches the Camera app. This is a useful feature of Windows 8 that you can turn on or off. The Camera app launched in this way seems to show a different interface from the one launched in the usual way. Maybe it is not the same Camera app? I am not sure.

## 4. how to turn off the feature

This quick access of camera from the lock screen is a useful feature but if you want to turn this feature off for some reason, type the following in Search:

lock screen camera


and by navigating from there, you will arrive at an option that says

Swipe down on the lock screen to use the camera


## Intuitions on problems from Elements of Information Theory Chapter 2

This post is not about sharing solutions to those problems in the book. As the title indicates, this post is rather about sharing intuitions or interpretations of some results mentioned or alluded in some problems listed at the end of Chapter 2 of the book “Elements of Information Theory” second edition. The chapter deals with the notion of entropy and mutual information.

Some of my colleagues and I were reading some rather huge paper which draws from many basic ideas from many fields, and one of them was the notion of mutual information. We only had a grasp on its special case, the notion of entropy of a random variable and since the general notion was new to us, I decided to read at least first few chapters of “Elements of Information Theory”.

## 1. meaning of conditional mutual information

Given three (discrete) random variables X, Y, Z, the conditional mutual information I(X; Y | Z) intuitively means how much X says about Y (or equivalently how much Y says about X) under the assumption that the value of Z is already known. Try to use this intuition to make sense of the chain rule for mutual information and the data-processing inequality.

As for its special case, the intuitions for conditional entropy, see (already linked previously) Shannon’s entropy of random variables and partitions.

## 2. comments on some selected problems from Chapter 2

### 2.1. Problem 2.6

The problem asks to find examples where mutual information increases or decreases when conditioned to a third random variable. This points out that I(X;Y | Z) is not monotone in Z. One should contrast this with the following two facts:

• Conditional entropy H(X | Y) is monotone in the conditioning Y.
• Mutual information I(X;Y | Z) is monotone in X and Y.

### 2.2. Problem 2.14

Given two independent (discrete) random variables X and Y, the entropy of the sum X+Y is at least $\max\{H(X),H(Y)\}$. This is perhaps related to why the central limit theorem works. If you keep adding i.i.d. random variables one by one, the entropy of the sum never decreases. So aside from special cases, we can expect that the entropy would get higher and higher. High entropy of the sum alone may not tell you much about the probability distribution of the sum, but the variance and the mean of the sum is also known, and IIRC the normal distribution maximizes entropy under the constraints on the mean and variance. It does not make a proof of the central limit theorem but I think it illustrates the relation between large degrees of freedom (in statistical mechanics) and probability distributions with maximal entropy.

Once you prove that the entropy of the independent sum X+Y is at least $\max\{H(X),H(Y)\}$, you would also be able to see that the entropy of the independent modulo sum X+Y modulo something like 6 for example is also at least $\max\{H(X),H(Y)\}$. If you have a biased dice, you may rightfully expect that you can simulate a less biased dice, by simply throwing the same dice several times and summing up the numbers modulo 6, and the entropy argument may be one way to justify this expectation.

### 2.3. Problem 2.23

The problem deals with $n$ binary random variables $X_1, X_2, \dots, X_n$ related by one linear equation $X_1 + \dots + X_n = 0 \pmod 2$. The given joint probability distribution is in some sense the most fair distribution under the constraint given by the linear equation.

The results are consistent with our intuition in that:

• knowing $X_1$ tells you nothing about $X_2$
• and given that you already know $X_1$, knowing $X_2$ tells you nothing about $X_3$
• and at the last step when you know $X_1$ to $X_{n-2}$, knowing $X_{n-1}$ tells you everything about $X_n$ (because of the linear equation).

The results can be different for other probability distributions while satisfying the same linear equation. For example, if we give some biased joint probability distribution, then it’s possible that knowing $X_1$ might tell you something about $X_2$ in the sense that $I(X_1; X_2) > 0$. In fact, you can even choose a distribution so that knowing $X_1$ tells you everything about $X_2$ in the sense that $I(X_1 ; X_2) = H(X_2) = 1$. That is the case when we choose the most fair distribution under the constraint $X_1 + X_2 = 1, X_3 = X_4 = \dots = X_n = 0$.

### 2.4. Problem 2.27

It asks to prove the grouping rule for entropy. It can be verified with a simple calculation, but maybe it is instructive to solve this instead by using auxiliary random variables and using already established equalities thus far.

### 2.5. Problem 2.41

One wishes to identify a random object X. Two i.i.d. random questions $Q_1, Q_2$ about the object are asked, eliciting answers $A_1 = A(X, Q_1), A_2 = A(X, Q_2)$. The problem asks to show that the two questions are less valuable than twice a single question.

At first, one might expect incorrectly that the two questions are worth twice a single question, but what about asking millions of questions about the outcome of a single dice roll? One million questions about the outcome is worth almost the same as two million questions, since by the time you ask the million’th, you would be almost certain of what the outcome of the dice roll was and the next million questions would not contribute much. The amount of information to be extracted with questions cannot grow linearly with the number of questions indefinitely and in fact there is an upper bound, namely, $H(X)$.

Some more thoughts reveal that it is because of independence (rather than in spite of independence) that two i.i.d questions are less valuable than twice a single question. Suppose X is the random position of a specific chess piece. If the first (random) question asked which row, then the next optimal question to ask would be to ask which column. Making subsequent optimal questions require dependence to previous questions and answers and therefore independent questions are less efficient than dependent questions, as anyone who has played Twenty Questions can attest.

## flood zones in Seoul

While looking for a place to live in Seoul, it came to my mind that I should check out which parts of Seoul tend to get flooded and here is what I found. Let’s hope that this post becomes obsolete in future as the local government is taking some measures.

Several news article in 2014 reported that the Seoul government named five special regions to take care of against floods and these are regions near:

• 강남역
• 사당역
• 광화문
• 도림천
• 한강로

도림천 (a small river) and 한강로 (a road) can be located as red lines in Naver Map. Locating the rest (강남역, 사당역, 광화문) should be obvious.

A news article in 2013 named top 4 regions that get flooded again and again and these are regions near:

• 강남역
• 광화문
• 안양천
• 사당역

There is also an interactive map showing the often flooded regions in Seoul, but it requires you to select a district first: 풍수해 정보지도. It can show affected areas in each of the years 2010, 2011, 2012, 2013.

## Putting a bar or a tilde over a letter in LaTeX

As you are aware, there are commands to put a bar or a tilde over a symbol in math mode in LaTeX. Sometimes, the output doesn’t come out the way some of us might expect or want.

Fortunately, there are alternative commands that do the same task differently that we can try and there are also other ways of using the same commands.

To put a tilde over a letter, we can use either \tilde or \widetilde. As for which one to use in which situation, compiling a document with the following as its part can help comparison.

$\tilde{A}$ vs $\widetilde{A}$

$\tilde{\mathcal A}$ vs $\widetilde{\mathcal A}$

$\tilde{ABC}$ vs $\widetilde{ABC}$


To put a bar over a letter, we can use either \bar or \overline. It seems that \bar is to \overline what \tilde is to \widetilde. There don’t seem to be \overtilde or \widebar.

$\bar{A}$ vs $\overline{A}$

$\bar{\mathcal A}$ vs $\overline{\mathcal A}$

$\bar{ABC}$ vs $\overline{ABC}$


Over a symbol with a subscript or superscript index, one can imagine four ways of putting a bar or tilde over it. Among the four ways, one can say that one or two of them looks right but that may depend. I can’t comment on which ones look good. You just have to try each and decide with your colleagues.

$\tilde{A_2}$ vs $\widetilde{A_2}$ vs $\tilde{A}_2$ vs $\widetilde{A}_2$

$\bar{A_2}$ vs $\overline{A_2}$ vs $\bar{A}_2$ vs $\overline{A}_2$
`