Tested versions:

- AUCTeX 11.89.7 (latest version from GNU ELPA)
- GNU Emacs 25.1

If you want to enable it for elisp buffers, you can add:

(add-hook 'emacs-lisp-mode-hook 'prettify-symbols-mode)

Then something like (lambda () (blah)) in elisp buffers should display as (λ () (blah)).

If you want to enable it also for other lisp buffers, scheme mode buffers etc, you can adjust the following code:

(dolist (mode '(scheme emacs-lisp lisp clojure)) (let ((here (intern (concat (symbol-name mode) "-mode-hook")))) ;; (add-hook here 'paredit-mode) (add-hook here 'prettify-symbols-mode)))

If you want to enable for all buffers, you can add:

(global-prettify-symbols-mode 1)

And then for major modes of your interest, you may want to adjust the buffer-local prettify-symbols-alist accordingly, following the simple example code you can find from the documentation for prettify-symbols-mode.

Following code may be expected to work:

(add-hook 'TeX-mode-hook 'prettify-symbols-mode)

If it works, then \alpha, \beta, \leftarrow and so on should display as α, β, ←, … for TeX file buffers. I do not doubt that it will just work fine in future versions of AUCTeX, and if you are reading this as an old article, it is possible that just upgrading your AUCTeX package may be enough to make that line work as you expected. If it doesn’t work, then try making the following two changes.

Instead of adding to the hook directly, try adding a delayed version, like so:

(defun my-delayed-prettify () (run-with-idle-timer 0 nil (lambda () (prettify-symbols-mode 1)))) (add-hook 'TeX-mode-hook 'my-delayed-prettify)

This way, (prettify-symbols-mode 1) is guaranteed to run after the style hooks and not before. I don’t know what style hooks do, but it looks like they may reset/erase font-lock stuff you have set up.

If pretty symbols still don’t show up in AUCTeX buffers, then try adding the following change, in addition to the above change.

This one isn’t really about adding something. It is about removing. Remove the following line from your dotemacs if any:

(require 'tex)

tex.el will load anyway if you visit a TeX file with Emacs. This is a strange change to make, indeed.

You should also remove the following line that is commonly used with miktex users:

(require 'tex-mik)

tex-mik.el is a good small library but tex-mik.el loads tex.el. Feel free to copy parts of tex-mik.el and paste to your dotemacs if you want.

You can ensure you have removed every call of (require ‘tex) from your dotemacs by appending the following line to the end of dotemacs and then restarting Emacs to see if the warning message shows up:

(if (featurep 'tex) (warn "(require 'tex) is still somewhere!"))

If there was some code in dotemacs that relied on the fact that (require ‘tex) was called before, then you have to wrap that code with the with-eval-after-load macro, like this:

(with-eval-after-load 'tex (add-to-list 'TeX-view-program-selection '(output-pdf "SumatraPDF")) (add-to-list 'TeX-view-program-list `("SumatraPDF" my--sumatrapdf)))

]]>

OneDrive is a great feature of MS Windows that many people and I love and use, but you might have encountered some issues with syncing. This becomes more visible when you have OneDrive tray icon always appear on the taskbar, which you can have by first going to “Taskbar settings” and then from there entering the link “Select which icons appear on the taskbar”.

Sometimes OneDrive is stuck in the middle of syncing and never finishes syncing. You can spot it from the tray icon indefinitely staying at the “processing changes” or “uploading …” state. Maybe a short way to describe the problem is to say that OneDrive suddenly behaves like Internet access is off, despite the fact that you can browse the Internet fine at that time.

There seems to be certain types of tasks that tend to mess with OneDrive temporarily:

- Compiling some code inside a folder in OneDrive (e.g. from working with a LaTeX document, writing a program)
- editing a text file in OneDrive using an editor that may generate or maintain auxiliary files alongside the text file (e.g. Emacs lock files)

Whether those tasks will actually cause it seems quite random. So it’s understandable that it might take a while for the folks at MS to fix this bug, if this is a bug.

The shortest way I know is: Long press or right click on the OneDrive icon to open up its context menu and select “pause syncing”. Then open the context menu again and select “Resume syncing”. You might need to update Windows 10 if you don’t see the “pause syncing” button.

Another way is to exit OneDrive (again using its context menu) and then restart it.

The syncing will get unstuck, but when you do one of those aforementioned tasks again, the syncing may get stuck again.

There seems to be no permanent solution available yet and we will all have to wait until a fix is released.

]]>

This is a short introduction to the notion of Wasserstein metric but presented in a way that explicitly highlights the earth mover intuition on the definition.

Throughout the post, we fix a metric space (X,d). Suppose a particle of mass is placed at position . The cost of moving this particle to a new position is defined to be . With this definition, you see:

- The cost of moving a particle from place x to the same place x is zero.
- The cost of moving a particle is proportional to its mass. (This makes sense: a particle of mass at position x is and should be cost-wise equivalent to two particles each of mass at the same position x.)
- The cost of moving a particle of unit mass from x to y is .

Imagine a configuration of three particles of total mass 1 placed in X. For each i = 1,2,3, the particle i has mass and is placed at . Since the total mass is 1, we require . The mass distribution of this configuration may be written as where denotes the Dirac delta function centered at . (Alternatively, you could interprete the sum to mean a formal sum, or use Dirac measures instead.)

Now consider another configuration but this time of four particles of total mass 1, placed in the same space X. In this second configuration, for each , there is a particle of mass at position . The mass distribution of this configuration may be written as .

Given the first configuration (three particles), we can split some or all of the three particles into smaller particles of various smaller mass and move them in a way that results in the second configuration. There is more than one way to do it.

Start with the first configuration. Suppose, for each , we split the particle of mass at position into four smaller particles of mass . The total number of split particles is now and for each . Move the split particle of mass from the old position to the new position . The total mass of all particles at the new position is and we require this to be equal to , so that they merge to become a particle of mass at position .

Any matrix of nonnegative numbers such that for all i and for all j specifies a split-move-merge procedure that transforms the first configuration into the second configuration.

The cost of a split-move-merge procedure is defined to be its total cost of moving: moving the particle of mass from to costs and so the total cost is the sum .

We define the cost of transforming the mass distribution into the second to be the minimum possible cost for split-move-merge procedures from the first configuration to the second configuration.

It is an easy check that this definition of distance between and does not depend on how you write and . (In other words, whether you see a particle at position x of mass or see two particles at the same position x each of mass does not affect costs.)

Each split-move-merge procedure defines a distribution on . This distribution is a coupling of and in the sense that it projects to and . Indeed, if you project it to the first coordinate, you would get .

You could think of the couping as a distribution on the space of arrows, where you think of each to be an abstract arrow from x to y.

The integration of the function d(x,y) (whose domain is the space of arrows) w.r.t. the distribution works out to be the same as the cost .

Conversely, each coupling of and gives rise to a split-move-merge procedure that transforms into whose cost is equal to the integration of d(x,y) w.r.t. .

So the cost of transforming to is the same as the minimum possible expected value of the length of arrows (x,y) over all possible couplings of and .

The distance between and is a metric. The triangle inequality on transfers to the triangle inequality for distance between because you can compose a transformation from to and another from to to get a third transformation from to whose cost is at most the sum of the costs of the two former transformations. This property is more easily seen if you think of as piles of dirt (of unit mass). In fact, suppose is three piles of dirt: pile i of mass at position . Suppose pile 1 is red dirt, pile 2 is yellow dirt, and pile 3 is green dirt. Move this three piles of dirt to form using some transformation . Each of the resulting piles may be a mixture of different colors. Now use some other transformation to move the piles again to form .The net effect is a transformation of into and the corresponding composition can be obtained by examining percentages of different colors of dirt in each of the final piles of dirt. (the output is actually not unique here, but there is an obvious canonical choice.)

What we have defined is known as the 1st Wasserstein distance between and and is usually denoted . For general non-discrete probability measures and on , the definition is done using couplings of and but you will have to take inf instead of min, and the distance can be infinite.

This distance can be thought of as a probabilistic analogue of the Hausdorff distance between two sets..

Suppose we are to move a collection of n particles whose each mass is . In the first configuration, the n particles are at the n places (some of which may be same places). In the second configuration, the new n places are . It is known that in this case there is always a split-move-merge procedure (between the two configurations) that minimizes the cost but also skips the split and the merge steps. The procedures without the split and merge steps are specified by functions from to itself. If is such a function, then this specifies a procedure in which you move the old particle of mass at the old position to the new position (at the cost of ) for .

So the cost of transforming to in this case is the same as the minimum of the average over all permutations on .

The fact that split and merge steps can be discarded is known as Birkhoff-von Neumann theorem, which actually states that any doubly stochastic matrix (i.e. a split-move-merge procedure in our case) can be written as a convex combination of permutations.

It is known that, in a metric space that is nice enough, any probability measure on it can be approximated by discrete distributions of the form in an appropriate sense.

(some lower bounds) If two probability measures and are supported on subsets respectively and if , then the distance between and is also . If we relax to and , then we still have the approximate .

(upper bounds) If is the pushforward image of under some and if is such that for all x, then . If we relax to just for all (uniformly) in some big set and the diameter of X is O(1), then we still obtain an approximate conclusion .

]]>

The readers are assumed to be familiar with names of basic Windows 8 touch gestures (for which, see http://windows.microsoft.com/en-us/windows-8/touch-swipe-tap-beyond )

With some Windows 8 tablets (or laptops), you may come across a sudden appearance of the Camera app or the sudden activation of the webcam when you are trying to turn on your laptop (or maybe more precisely, when you are trying to sign in to your PC from the lock screen). Since the screen makes it clear that the webcam is on, you know you have unintentionally launched some Camera app. Surprisingly, the usual ways of closing or exiting the app or switching to other apps does not work. How should one get out of this and get back to the usual lock screen (the log in screen)?

It is actually quite simple to turn off or deactivate this thing. You simply tap or click **Unlock** which should be found on the left side of the app commands at the bottom of the screen and you are back to the lock screen.

You usually swipe up on the lock screen to sign in to your PC, but when you swipe down on it instead, that launches the Camera app. This is a useful feature of Windows 8 that you can turn on or off. The Camera app launched in this way seems to show a different interface from the one launched in the usual way. Maybe it is not the same Camera app? I am not sure.

This quick access of camera from the lock screen is a useful feature but if you want to turn this feature off for some reason, type the following in Search:

lock screen camera

and by navigating from there, you will arrive at an option that says

Swipe down on the lock screen to use the camera

]]>

This post is **not** about sharing solutions to those problems in the book. As the title indicates, this post is rather about sharing intuitions or interpretations of some results mentioned or alluded in some problems listed at the end of Chapter 2 of the book “Elements of Information Theory” second edition. The chapter deals with the notion of entropy and mutual information.

Some of my colleagues and I were reading some rather huge paper which draws from many basic ideas from many fields, and one of them was the notion of mutual information. We only had a grasp on its special case, the notion of entropy of a random variable and since the general notion was new to us, I decided to read at least first few chapters of “Elements of Information Theory”.

Given three (discrete) random variables X, Y, Z, the conditional mutual information I(X; Y | Z) intuitively means how much X says about Y (or equivalently how much Y says about X) under the assumption that the value of Z is already known. Try to use this intuition to make sense of the chain rule for mutual information and the data-processing inequality.

As for its special case, the intuitions for conditional entropy, see (already linked previously) Shannon’s entropy of random variables and partitions.

The problem asks to find examples where mutual information increases or decreases when conditioned to a third random variable. This points out that I(X;Y | Z) is **not** monotone in Z. One should contrast this with the following two facts:

- Conditional entropy H(X | Y)
**is**monotone in the conditioning Y. - Mutual information I(X;Y | Z)
**is**monotone in X and Y.

Given two **independent** (discrete) random variables X and Y, the entropy of the sum X+Y is at least . This is perhaps related to why the central limit theorem works. If you keep adding i.i.d. random variables one by one, the entropy of the sum never decreases. So aside from special cases, we can expect that the entropy would get higher and higher. High entropy of the sum alone may not tell you much about the probability distribution of the sum, but the variance and the mean of the sum is also known, and IIRC the normal distribution maximizes entropy under the constraints on the mean and variance. It does not make a proof of the central limit theorem but I think it illustrates the relation between large degrees of freedom (in statistical mechanics) and probability distributions with maximal entropy.

Once you prove that the entropy of the independent sum X+Y is at least , you would also be able to see that the entropy of the independent modulo sum X+Y modulo something like 6 for example is also at least . If you have a biased dice, you may rightfully expect that you can simulate a less biased dice, by simply throwing the same dice several times and summing up the numbers modulo 6, and the entropy argument may be one way to justify this expectation.

The problem deals with binary random variables related by one linear equation . The given joint probability distribution is in some sense the most fair distribution under the constraint given by the linear equation.

The results are consistent with our intuition in that:

- knowing tells you nothing about
- and given that you already know , knowing tells you nothing about
- …
- and at the last step when you know to , knowing tells you everything about (because of the linear equation).

The results can be different for other probability distributions while satisfying the same linear equation. For example, if we give some biased joint probability distribution, then it’s possible that knowing might tell you something about in the sense that . In fact, you can even choose a distribution so that knowing tells you everything about in the sense that . That is the case when we choose the most fair distribution under the constraint .

It asks to prove the grouping rule for entropy. It can be verified with a simple calculation, but maybe it is instructive to solve this instead by using auxiliary random variables and using already established equalities thus far.

One wishes to identify a random object X. Two i.i.d. random questions about the object are asked, eliciting answers . The problem asks to show that the two questions are less valuable than twice a single question.

At first, one might expect incorrectly that the two questions are worth twice a single question, but what about asking millions of questions about the outcome of a single dice roll? One million questions about the outcome is worth almost the same as two million questions, since by the time you ask the million’th, you would be almost certain of what the outcome of the dice roll was and the next million questions would not contribute much. The amount of information to be extracted with questions cannot grow linearly with the number of questions indefinitely and in fact there is an upper bound, namely, .

Some more thoughts reveal that it is because of independence (rather than in spite of independence) that two i.i.d questions are less valuable than twice a single question. Suppose X is the random position of a specific chess piece. If the first (random) question asked which row, then the next optimal question to ask would be to ask which column. Making subsequent optimal questions require dependence to previous questions and answers and therefore independent questions are less efficient than dependent questions, as anyone who has played Twenty Questions can attest.

]]>

Several news article in 2014 reported that the Seoul government named five special regions to take care of against floods and these are regions near:

- 강남역
- 사당역
- 광화문
- 도림천
- 한강로

도림천 (a small river) and 한강로 (a road) can be located as red lines in Naver Map. Locating the rest (강남역, 사당역, 광화문) should be obvious.

A news article in 2013 named top 4 regions that get flooded again and again and these are regions near:

- 강남역
- 광화문
- 안양천
- 사당역

There is also an interactive map showing the often flooded regions in Seoul, but it requires you to select a district first: 풍수해 정보지도. It can show affected areas in each of the years 2010, 2011, 2012, 2013.

]]>

Fortunately, there are alternative commands that do the same task differently that we can try and there are also other ways of using the same commands.

To put a tilde over a letter, we can use either `\tilde`

or `\widetilde`

. As for which one to use in which situation, compiling a document with the following as its part can help comparison.

$\tilde{A}$ vs $\widetilde{A}$ $\tilde{\mathcal A}$ vs $\widetilde{\mathcal A}$ $\tilde{ABC}$ vs $\widetilde{ABC}$

To put a bar over a letter, we can use either `\bar`

or `\overline`

. It seems that `\bar`

is to `\overline`

what `\tilde`

is to `\widetilde`

. There don’t seem to be `\overtilde`

or `\widebar`

.

$\bar{A}$ vs $\overline{A}$ $\bar{\mathcal A}$ vs $\overline{\mathcal A}$ $\bar{ABC}$ vs $\overline{ABC}$

Over a symbol with a subscript or superscript index, one can imagine four ways of putting a bar or tilde over it. Among the four ways, one can say that one or two of them looks right but that may depend. I can’t comment on which ones look good. You just have to try each and decide with your colleagues.

$\tilde{A_2}$ vs $\widetilde{A_2}$ vs $\tilde{A}_2$ vs $\widetilde{A}_2$ $\bar{A_2}$ vs $\overline{A_2}$ vs $\bar{A}_2$ vs $\overline{A}_2$

]]>

The goal of this post is to hopefully convince you why you should learn elisp. Readers are assumed to be beginning users of Emacs. (Of course he who does not use Emacs does not need to learn elisp)

So you are a beginning user of Emacs and let me raise a relevant question: suppose someone asked you “Why did you choose Emacs?” The longer question would be “What is the point of using Emacs in particular when there are other great text editors out there that are also cross-platform, customizable and extensible?”

You probably have your answer to the question. If you want to hear my answer, I’d say “I choose Emacs because Emacs is customizable and extensible *to an extreme degree*. I believe that is the defining character of Emacs. That may not mean that everybody should use Emacs but that is certainly why I will continue to use Emacs.”

I think my customization of Emacs is only to a mild degree and I may never actually customize it to an extreme degree. However, because I know that Emacs allows extreme customization, I can be sure that Emacs allows any kind of mild customization I might want later. It’s like how some columnist would feel sure that organizations like ACLU will come defend his right to say his mildly controversial speeches if needed after he has seen ACLU defend those with extreme opinions.

The extent to which Emacs can be extended to suit your needs. You can do a lot with Emacs *without* learning elisp: you can define keyboard macros, you can install and use Emacs packages for your needs, you can use the Customize facility to customize your Emacs and packages to some extent. That is, without learning elisp, you can already experience Emacs to be one of those editors that are cross-platform, customizable and extensible. As you learn elisp and as you start to use hooks in interesting ways, or make your own small minor modes, or even make your own Emacs package, you start to experience Emacs to be an editor that is customizable and extensible to an extreme level. Think of that as a kind of premium feature of Emacs which you unlock by paying, and you pay not with dollars, but with time: time devoted to learning elisp.

On the other hand, there is an old saying: “Time is money”. Of course we can’t and shouldn’t devote all day to learning elisp. And we wouldn’t expect that one hour of learning elisp would unlock all the power. It’s a gradual thing. A learning curve is there, but not as steep as you might expect. Do you have something in mind that you always wanted to have in your text editor? Your mission, should you choose to accept it, is to learn enough elisp to make that something happen.

]]>

Writing LaTeX documents seem to require many presses of backslashes and the backslash key is located at awkward place and hard to type in some types of keyboards. For users of Emacs AUCTeX, there are some ways to ease this pain.

With this elisp code added to your init file, whenever you type slash in a LaTeX buffer, backslash is inserted instead. The slash key is easier to reach.

(defun my-make-slash-backslash () (local-set-key (kbd "/") "\\")) (add-hook 'TeX-mode-hook 'my-make-slash-backslash)

When you have to enter slash itself, which is not often, use `C-q`

. When you have to enter many slashes, for example you could be typing a url, in this case I believe copy and paste is the simplest workaround. It is possible to extend the code so that typing backslash results in insertion of slash as well, so that is another workaround, but I find that disorienting, so I don’t use that workaround, but you may want to use that.

If you want to be able to turn on and off “slash turns into backslash” on the fly, you can write a minor mode for that.

Another way to enter backslashes with ease is to use `C-c C-m`

to enter a TeX macro (things like `\section{...}`

and `\frac{..}{..}`

) and `C-c C-e`

to enter a LaTeX environment (things like `\begin{..} .. \end{..}`

) in LaTeX buffers.

]]>

Let f, g be measurable functions from a probability space to measurable spaces and respectively. Consider the following three conditions:

(1) (in other words, each element in is equivalent (up to null sets) to some element in . The notation is used when an author wants to make explicit the practice of treating sub-sigma algebras ignoring null sets.)

(2) There is a measurable with a.e.

(3) There is a measure preserving map with a.e. where are the pushforward measures obtained from .

(2) implies (3) trivially. (3) implies (2) trivially (even if the measure preserving map is only a.e. defined.)

(2) implies (1) trivially. The Doob-Dynkin lemma for probability space is that (1) implies (2) if is a standard Borel space. In this post, we present some proofs of this lemma and some corollaries and others. Compare with the Doob-Dynkin lemma for measurable spaces (the previous article). As we’ll see, Doob-Dynkin lemma for probability space is a special case of that for measurable spaces. The one for probability spaces suffices for most applications in probability theory and ergodic theory.

In this proof, we simply show that the Doob-Dynkin lemma for probability space follows from that for measurable spaces. It is enough to show that the condition can be improved to after discarding all points in some null set from .

For each , there is such that the symmetric difference is a -null set. It would be nice to be able to discard all points in the union but is this a null set? Perhaps not.

**Exercise 10.** Show that it is enough to discard all points in the countable union where is a sequence that generates (Existence of such a sequence is guaranteed by the assumption that is a standard Borel space).

In this proof, we adapt an argument in the previous article to our setting of ignoring null sets. We divide into two cases.

Case I: when the image of g countable

In this case, there is such that a.e. for each i.

Notice that form a countable partition modulo , in other words, the intersection of and is a null set when and their union over all i has measure 1, or equivalently that for a.e. , there exists unique i for which .

**Exercise 20.** Show that form a countable partition mod . (the general idea is that the measure algebra homomorphism behaves like an inclusion map)

Let H be any (everywhere-defined) measurable function from X to Y such that for -a.e. x and for each i we have . It is easy to define such an H.

Now it only remains to show a.e. For a.e. , we have:

there is i for which is in ;

for that i, (by property of H);

but that (by how chosen);

so on this .

Case II: when the image of g is not countable.

WLOG assume is the unit interval.

Let . There is a measurable with a.e..

Notice that holds everywhere. It would be nice to take but we only know converges at least on for some subset of measure 1, and we don’t know if this image is measurable.

A work around is that we let H be any (everywhere-defined) measurable such that for all x for which the limit exists. Such H exists and with this H we can proceed to show a.e..

Another work around is to notice that the subset (alternatively also ) is measurable and so one can show that its measure is 1. Then we set H to be a -a.e. limit of . Then is a P-a.e. limit of . So we can proceed with this H too.

In this proof, we show that (1) implies (3) using the fact that is a Lebesgue space (which follows because is a standard Borel space) and the fact that a Lebesgue space can be approximated by a sequence of partitions. This proof is probably equivalent to the previous one, but let’s do it for illustration.

The idea in this proof is that points in X (resp. Y) more or less should correspond to appropriate nested sequence of elements in (resp. ), so in order to build H, we only need establish an appropriate procedure to transform an appropriate nested sequence of elements in to that of , but that is done for free by observing that with abuse of notation we may safely pretend that (if we identify with ). We’ll proceed without abuse of notation.

Notice that WLOG we may replace and with any other almost-isomorphic probability spaces at the minor cost of losing everywhere definedness of f, g.

WLOG there is a non-decreasing sequence of countable partitions such that for any non-increasing choice the intersection is a single point and that generates . This is possible because we may assume that the probability space Y is a disjoint union of the Cantor space and atoms, which follows from the result that any Lebesgue space is almost-isomorphic to a disjoint union of an interval and atoms).

Now we want to pick a good sequence such that for all n we have . We can assume that for each n all elements of (and hence also those of too) have positive measure. So we can ensure that all elements of have positive measure. For each n, is a countable partition modulo and the sequence is non-decreasing modulo . After discarding only countably many -null sets from X, we can ensure that form a non-decreasing sequence of countable partitions. The reason we ensure elements of and only have elements of positive measure is because we want to ensure the natural bijection between and .

For each , let be the unique element in containing x, and let be the unique element in which corresponds to in the sense that . Define H by which is a well defined map from X to Y (which is because is non-decreasing (for a fixed x) because its counterpart in is non-decreasing up to null sets because its counterpart in Y is.).

(Measurability of H) For each , it is easy to show that is measurable, moreover, equal to (where is the element corresponding to ).

(On a.e.) Notice that H is measurable and everywhere defined, so is an almost-everywhere defined measurable function. So we are comparing two a.e. defined measurable functions from to . Inverse images of under the two functions are equal up to null sets because of . Now notice that the set is a subset of where the terms in the latter are null sets because of the previous sentence. We have shown a.e..

(On measure preserving) It is easy to show that H is measure preserving from the fact that a.e.. Alternatively, it also follows from .

If H and H’ satisfy (2) , then holds -a.e.. This follows from the following two observations:

- holds -a.e.
- is measurable (because is a standard Borel space).

**Exercise 25.** Let f, g be measurable functions from a probability space to measurable spaces and respectively. Show that if and only if there are with and with and a bi-measurable bijection such that a.e..

**Exercise 30.** Given two measure-theoretic dynamical systems and (measure preserving transformations on Lebesgue spaces), show that the latter is a factor of the former if and only if there is a joining (a -invariant probability measure on ) such that mod (with abuse of notation).

]]>