## How to close the sudden Camera app from Windows 8

The readers are assumed to be familiar with names of basic Windows 8 touch gestures (for which, see http://windows.microsoft.com/en-us/windows-8/touch-swipe-tap-beyond )

## 1. introduction

With some Windows 8 tablets (or laptops), you may come across a sudden appearance of the Camera app or the sudden activation of the webcam when you are trying to turn on your laptop (or maybe more precisely, when you are trying to sign in to your PC from the lock screen). Since the screen makes it clear that the webcam is on, you know you have unintentionally launched some Camera app. Surprisingly, the usual ways of closing or exiting the app or switching to other apps does not work. How should one get out of this and get back to the usual lock screen (the log in screen)?

## 2. how to get out

It is actually quite simple to turn off or deactivate this thing. You simply tap or click Unlock which should be found on the left side of the app commands at the bottom of the screen and you are back to the lock screen.

## 3. how did I end up there

You usually swipe up on the lock screen to sign in to your PC, but when you swipe down on it instead, that launches the Camera app. This is a useful feature of Windows 8 that you can turn on or off. The Camera app launched in this way seems to show a different interface from the one launched in the usual way. Maybe it is not the same Camera app? I am not sure.

## 4. how to turn off the feature

This quick access of camera from the lock screen is a useful feature but if you want to turn this feature off for some reason, type the following in Search:

lock screen camera


and by navigating from there, you will arrive at an option that says

Swipe down on the lock screen to use the camera


## Intuitions on problems from Elements of Information Theory Chapter 2

This post is not about sharing solutions to those problems in the book. As the title indicates, this post is rather about sharing intuitions or interpretations of some results mentioned or alluded in some problems listed at the end of Chapter 2 of the book “Elements of Information Theory” second edition. The chapter deals with the notion of entropy and mutual information.

Some of my colleagues and I were reading some rather huge paper which draws from many basic ideas from many fields, and one of them was the notion of mutual information. We only had a grasp on its special case, the notion of entropy of a random variable and since the general notion was new to us, I decided to read at least first few chapters of “Elements of Information Theory”.

## 1. meaning of conditional mutual information

Given three (discrete) random variables X, Y, Z, the conditional mutual information I(X; Y | Z) intuitively means how much X says about Y (or equivalently how much Y says about X) under the assumption that the value of Z is already known. Try to use this intuition to make sense of the chain rule for mutual information and the data-processing inequality.

As for its special case, the intuitions for conditional entropy, see (already linked previously) Shannon’s entropy of random variables and partitions.

## 2. comments on some selected problems from Chapter 2

### 2.1. Problem 2.6

The problem asks to find examples where mutual information increases or decreases when conditioned to a third random variable. This points out that I(X;Y | Z) is not monotone in Z. One should contrast this with the following two facts:

• Conditional entropy H(X | Y) is monotone in the conditioning Y.
• Mutual information I(X;Y | Z) is monotone in X and Y.

### 2.2. Problem 2.14

Given two independent (discrete) random variables X and Y, the entropy of the sum X+Y is at least $\max\{H(X),H(Y)\}$. This is perhaps related to why the central limit theorem works. If you keep adding i.i.d. random variables one by one, the entropy of the sum never decreases. So aside from special cases, we can expect that the entropy would get higher and higher. High entropy of the sum alone may not tell you much about the probability distribution of the sum, but the variance and the mean of the sum is also known, and IIRC the normal distribution maximizes entropy under the constraints on the mean and variance. It does not make a proof of the central limit theorem but I think it illustrates the relation between large degrees of freedom (in statistical mechanics) and probability distributions with maximal entropy.

Once you prove that the entropy of the independent sum X+Y is at least $\max\{H(X),H(Y)\}$, you would also be able to see that the entropy of the independent modulo sum X+Y modulo something like 6 for example is also at least $\max\{H(X),H(Y)\}$. If you have a biased dice, you may rightfully expect that you can simulate a less biased dice, by simply throwing the same dice several times and summing up the numbers modulo 6, and the entropy argument may be one way to justify this expectation.

### 2.3. Problem 2.23

The problem deals with $n$ binary random variables $X_1, X_2, \dots, X_n$ related by one linear equation $X_1 + \dots + X_n = 0 \pmod 2$. The given joint probability distribution is in some sense the most fair distribution under the constraint given by the linear equation.

The results are consistent with our intuition in that:

• knowing $X_1$ tells you nothing about $X_2$
• and given that you already know $X_1$, knowing $X_2$ tells you nothing about $X_3$
• and at the last step when you know $X_1$ to $X_{n-2}$, knowing $X_{n-1}$ tells you everything about $X_n$ (because of the linear equation).

The results can be different for other probability distributions while satisfying the same linear equation. For example, if we give some biased joint probability distribution, then it’s possible that knowing $X_1$ might tell you something about $X_2$ in the sense that $I(X_1; X_2) > 0$. In fact, you can even choose a distribution so that knowing $X_1$ tells you everything about $X_2$ in the sense that $I(X_1 ; X_2) = H(X_2) = 1$. That is the case when we choose the most fair distribution under the constraint $X_1 + X_2 = 1, X_3 = X_4 = \dots = X_n = 0$.

### 2.4. Problem 2.27

It asks to prove the grouping rule for entropy. It can be verified with a simple calculation, but maybe it is instructive to solve this instead by using auxiliary random variables and using already established equalities thus far.

### 2.5. Problem 2.41

One wishes to identify a random object X. Two i.i.d. random questions $Q_1, Q_2$ about the object are asked, eliciting answers $A_1 = A(X, Q_1), A_2 = A(X, Q_2)$. The problem asks to show that the two questions are less valuable than twice a single question.

At first, one might expect incorrectly that the two questions are worth twice a single question, but what about asking millions of questions about the outcome of a single dice roll? One million questions about the outcome is worth almost the same as two million questions, since by the time you ask the million’th, you would be almost certain of what the outcome of the dice roll was and the next million questions would not contribute much. The amount of information to be extracted with questions cannot grow linearly with the number of questions indefinitely and in fact there is an upper bound, namely, $H(X)$.

Some more thoughts reveal that it is because of independence (rather than in spite of independence) that two i.i.d questions are less valuable than twice a single question. Suppose X is the random position of a specific chess piece. If the first (random) question asked which row, then the next optimal question to ask would be to ask which column. Making subsequent optimal questions require dependence to previous questions and answers and therefore independent questions are less efficient than dependent questions, as anyone who has played Twenty Questions can attest.

## flood zones in Seoul

While looking for a place to live in Seoul, it came to my mind that I should check out which parts of Seoul tend to get flooded and here is what I found. Let’s hope that this post becomes obsolete in future as the local government is taking some measures.

Several news article in 2014 reported that the Seoul government named five special regions to take care of against floods and these are regions near:

• 강남역
• 사당역
• 광화문
• 도림천
• 한강로

도림천 (a small river) and 한강로 (a road) can be located as red lines in Naver Map. Locating the rest (강남역, 사당역, 광화문) should be obvious.

A news article in 2013 named top 4 regions that get flooded again and again and these are regions near:

• 강남역
• 광화문
• 안양천
• 사당역

There is also an interactive map showing the often flooded regions in Seoul, but it requires you to select a district first: 풍수해 정보지도. It can show affected areas in each of the years 2010, 2011, 2012, 2013.

## Putting a bar or a tilde over a letter in LaTeX

As you are aware, there are commands to put a bar or a tilde over a symbol in math mode in LaTeX. Sometimes, the output doesn’t come out the way some of us might expect or want.

Fortunately, there are alternative commands that do the same task differently that we can try and there are also other ways of using the same commands.

To put a tilde over a letter, we can use either \tilde or \widetilde. As for which one to use in which situation, compiling a document with the following as its part can help comparison.

$\tilde{A}$ vs $\widetilde{A}$

$\tilde{\mathcal A}$ vs $\widetilde{\mathcal A}$

$\tilde{ABC}$ vs $\widetilde{ABC}$


To put a bar over a letter, we can use either \bar or \overline. It seems that \bar is to \overline what \tilde is to \widetilde. There don’t seem to be \overtilde or \widebar.

$\bar{A}$ vs $\overline{A}$

$\bar{\mathcal A}$ vs $\overline{\mathcal A}$

$\bar{ABC}$ vs $\overline{ABC}$


Over a symbol with a subscript or superscript index, one can imagine four ways of putting a bar or tilde over it. Among the four ways, one can say that one or two of them looks right but that may depend. I can’t comment on which ones look good. You just have to try each and decide with your colleagues.

$\tilde{A_2}$ vs $\widetilde{A_2}$ vs $\tilde{A}_2$ vs $\widetilde{A}_2$

$\bar{A_2}$ vs $\overline{A_2}$ vs $\bar{A}_2$ vs $\overline{A}_2$


## why learn elisp (Emacs Lisp)

This post is the intro module of Living with Emacs Lisp.

The goal of this post is to hopefully convince you why you should learn elisp. Readers are assumed to be beginning users of Emacs. (Of course he who does not use Emacs does not need to learn elisp)

So you are a beginning user of Emacs and let me raise a relevant question: suppose someone asked you “Why did you choose Emacs?” The longer question would be “What is the point of using Emacs in particular when there are other great text editors out there that are also cross-platform, customizable and extensible?”

You probably have your answer to the question. If you want to hear my answer, I’d say “I choose Emacs because Emacs is customizable and extensible to an extreme degree. I believe that is the defining character of Emacs. That may not mean that everybody should use Emacs but that is certainly why I will continue to use Emacs.”

I think my customization of Emacs is only to a mild degree and I may never actually customize it to an extreme degree. However, because I know that Emacs allows extreme customization, I can be sure that Emacs allows any kind of mild customization I might want later. It’s like how some columnist would feel sure that organizations like ACLU will come defend his right to say his mildly controversial speeches if needed after he has seen ACLU defend those with extreme opinions.

The extent to which Emacs can be extended to suit your needs. You can do a lot with Emacs without learning elisp: you can define keyboard macros, you can install and use Emacs packages for your needs, you can use the Customize facility to customize your Emacs and packages to some extent. That is, without learning elisp, you can already experience Emacs to be one of those editors that are cross-platform, customizable and extensible. As you learn elisp and as you start to use hooks in interesting ways, or make your own small minor modes, or even make your own Emacs package, you start to experience Emacs to be an editor that is customizable and extensible to an extreme level. Think of that as a kind of premium feature of Emacs which you unlock by paying, and you pay not with dollars, but with time: time devoted to learning elisp.

On the other hand, there is an old saying: “Time is money”. Of course we can’t and shouldn’t devote all day to learning elisp. And we wouldn’t expect that one hour of learning elisp would unlock all the power. It’s a gradual thing. A learning curve is there, but not as steep as you might expect. Do you have something in mind that you always wanted to have in your text editor? Your mission, should you choose to accept it, is to learn enough elisp to make that something happen.

Posted in Emacs | Tagged | 2 Comments

## Type backslash easily in Emacs AUCTeX

Writing LaTeX documents seem to require many presses of backslashes and the backslash key is located at awkward place and hard to type in some types of keyboards. For users of Emacs AUCTeX, there are some ways to ease this pain.

## 1. type slash to type backslash

With this elisp code added to your init file, whenever you type slash in a LaTeX buffer, backslash is inserted instead. The slash key is easier to reach.

(defun my-make-slash-backslash ()
(local-set-key (kbd "/") "\\"))



When you have to enter slash itself, which is not often, use C-q. When you have to enter many slashes, for example you could be typing a url, in this case I believe copy and paste is the simplest workaround. It is possible to extend the code so that typing backslash results in insertion of slash as well, so that is another workaround, but I find that disorienting, so I don’t use that workaround, but you may want to use that.

If you want to be able to turn on and off “slash turns into backslash” on the fly, you can write a minor mode for that.

## 2. use the two commands

Another way to enter backslashes with ease is to use C-c C-m to enter a TeX macro (things like \section{...} and \frac{..}{..}) and C-c C-e to enter a LaTeX environment (things like \begin{..} .. \end{..}) in LaTeX buffers.

## Doob-Dynkin Lemma for probability space

Let f, g be measurable functions from a probability space $(\Omega, \mathcal F, \mathbb P)$ to measurable spaces $(X, \mathcal A)$ and $(Y, \mathcal B)$ respectively. Consider the following three conditions:
(1) $\sigma(f) \supset \sigma(g) \bmod \mathbb P$ (in other words, each element in $\sigma(g)$ is equivalent (up to null sets) to some element in $\sigma(f)$. The notation $\bmod \mathbb P$ is used when an author wants to make explicit the practice of treating sub-sigma algebras ignoring null sets.)
(2) There is a measurable $H: (X, \mathcal A) \to (Y, \mathcal B)$ with $g = H \circ f$ a.e.
(3) There is a measure preserving map $H: (X, \mathcal A, \mu) \to (Y, \mathcal B, \nu)$ with $g = H \circ f$ a.e. where $\mu, \nu$ are the pushforward measures obtained from $\mathbb P$.

(2) implies (3) trivially. (3) implies (2) trivially (even if the measure preserving map is only a.e. defined.)

(2) implies (1) trivially. The Doob-Dynkin lemma for probability space is that (1) implies (2) if $(Y, \mathcal B)$ is a standard Borel space. In this post, we present some proofs of this lemma and some corollaries and others. Compare with the Doob-Dynkin lemma for measurable spaces (the previous article). As we’ll see, Doob-Dynkin lemma for probability space is a special case of that for measurable spaces. The one for probability spaces suffices for most applications in probability theory and ergodic theory.

## 1. Proof 1

In this proof, we simply show that the Doob-Dynkin lemma for probability space follows from that for measurable spaces. It is enough to show that the condition $\sigma(f) \supset \sigma(g) \bmod \mathbb P$ can be improved to $\sigma(f) \supset \sigma(g)$ after discarding all points in some null set from $\Omega$.

For each $B \in \mathcal B$, there is $A_B \in \mathcal A$ such that the symmetric difference $E_B := g^{-1}B \cap f^{-1}A_B$ is a $\mathbb P$-null set. It would be nice to be able to discard all points in the union $\bigcup_{B \in \mathcal B} E_B$ but is this a null set? Perhaps not.

Exercise 10. Show that it is enough to discard all points in the countable union $\bigcup_{n} E_{B_n}$ where $(B_n)_n$ is a sequence that generates $\mathcal B$ (Existence of such a sequence is guaranteed by the assumption that $(Y, \mathcal B)$ is a standard Borel space).

## 2. Proof 2

In this proof, we adapt an argument in the previous article to our setting of ignoring null sets. We divide into two cases.

Case I: when the image of g countable

In this case, there is $A_i \in \mathcal A$ such that $g^{-1}y_i = f^{-1}A_i$ a.e. for each i.

Notice that $f^{-1}A_i$ form a countable partition modulo $\mathbb P$, in other words, the intersection of $f^{-1}A_i$ and $f^{-1}A_j$ is a null set when $i \ne j$ and their union over all i has measure 1, or equivalently that for a.e. $\omega$, there exists unique i for which $\omega \in f^{-1}A_i$.

Exercise 20. Show that $A_i$ form a countable partition mod $\mu$. (the general idea is that the measure algebra homomorphism $A \mapsto f^{-1}A$ behaves like an inclusion map)

Let H be any (everywhere-defined) measurable function from X to Y such that for $\mu$-a.e. x and for each i we have $x \in A_i \implies H(x) = y_i$. It is easy to define such an H.

Now it only remains to show $g = H \circ f$ a.e. For a.e. $\omega \in \Omega$, we have:
there is i for which $f(\omega)$ is in $A_i$;
for that i, $H(f(\omega)) = y_i$ (by property of H);
but that $y_i = g(\omega)$ (by how $A_i$ chosen);
so $H\circ f = g$ on this $\omega$.

Case II: when the image of g is not countable.

WLOG assume $(Y, \mathcal B)$ is the unit interval.

Let $g_n := \lfloor 2^n g \rfloor / 2^n$. There is a measurable $H_n: (X, \mathcal A) \to (Y, \mathcal B)$ with $g_n = H_n \circ f$ a.e..

Notice that $g_n \nearrow g$ holds everywhere. It would be nice to take $H = \lim H_n$ but we only know $H_n$ converges at least on $f(\Omega_0)$ for some subset $\Omega_0$ of measure 1, and we don’t know if this image is measurable.

A work around is that we let H be any (everywhere-defined) measurable such that $H(x) = \lim H_n(x)$ for all x for which the limit exists. Such H exists and with this H we can proceed to show $g = H \circ f$ a.e..

Another work around is to notice that the subset $[H_n \nearrow]$ (alternatively also $[\exists \lim H_n]$) is measurable and so one can show that its measure is 1. Then we set H to be a $\mu$-a.e. limit of $H_n$. Then $H \circ f$ is a P-a.e. limit of $H_n \circ f$. So we can proceed with this H too.

## 3. Proof 3

In this proof, we show that (1) implies (3) using the fact that $(Y, \mathcal B, \nu)$ is a Lebesgue space (which follows because $(Y, \mathcal B)$ is a standard Borel space) and the fact that a Lebesgue space can be approximated by a sequence of partitions. This proof is probably equivalent to the previous one, but let’s do it for illustration.

The idea in this proof is that points in X (resp. Y) more or less should correspond to appropriate nested sequence of elements in $\mathcal A$ (resp. $\mathcal B$), so in order to build H, we only need establish an appropriate procedure to transform an appropriate nested sequence of elements in $\mathcal A$ to that of $\mathcal B$, but that is done for free by observing that with abuse of notation we may safely pretend that $\mathcal B \subset \mathcal A \subset \mathcal F \bmod \mathbb P$ (if we identify $\sigma(f), \sigma(g)$ with $\mathcal A, \mathcal B$). We’ll proceed without abuse of notation.

Notice that WLOG we may replace $(X, \mathcal A, \mu)$ and $(Y, \mathcal B, \nu)$ with any other almost-isomorphic probability spaces at the minor cost of losing everywhere definedness of f, g.

WLOG there is a non-decreasing sequence of countable partitions $\beta_n \subset B$ such that for any non-increasing choice $B_n \in \beta_n$ the intersection $\cap_n B_n$ is a single point and that $(\beta_n)_n$ generates $\mathcal B$. This is possible because we may assume that the probability space Y is a disjoint union of the Cantor space and atoms, which follows from the result that any Lebesgue space is almost-isomorphic to a disjoint union of an interval and atoms).

Now we want to pick a good sequence $\alpha_n$ such that for all n we have $g^{-1}(\beta_n) = f^{-1}(\alpha_n) \bmod \mathbb P$. We can assume that for each n all elements of $\beta_n$ (and hence also those of $g^{-1}(\beta_n)$ too) have positive measure. So we can ensure that all elements of $\alpha_n$ have positive measure. For each n, $\alpha_n$ is a countable partition modulo $\mu$ and the sequence $(\alpha_n)_n$ is non-decreasing modulo $\mu$. After discarding only countably many $\mu$-null sets from X, we can ensure that $\alpha_n$ form a non-decreasing sequence of countable partitions. The reason we ensure elements of $\alpha_n$ and $\beta_n$ only have elements of positive measure is because we want to ensure the natural bijection between $\alpha_n$ and $\beta_n$.

For each $x \in X$, let $\alpha_n(x)$ be the unique element in $\alpha_n$ containing x, and let $\beta_n(x)$ be the unique element in $\beta_n$ which corresponds to $\alpha_n(x)$ in the sense that $g^{-1}(\beta_n(x)) = f^{-1}(\alpha_n(x)) \mod \mathbb P$. Define H by $\{H(x)\} = \cap_n \beta_n(x)$ which is a well defined map from X to Y (which is because $\beta_n(x)$ is non-decreasing (for a fixed x) because its counterpart in $\Omega$ is non-decreasing up to null sets because its counterpart in Y is.).

(Measurability of H) For each $B_n \in \beta_n$, it is easy to show that $H^{-1}(B_n)$ is measurable, moreover, equal to $A_n$ (where $A_n \in \alpha_n$ is the element corresponding to $B_n$).

(On $g = H \circ f$ a.e.) Notice that H is measurable and everywhere defined, so $g' := H\circ f$ is an almost-everywhere defined measurable function. So we are comparing two a.e. defined measurable functions from $(\Omega, \mathcal F, \mathbb P)$ to $(Y, \mathcal B)$. Inverse images of $B_n \in \beta_n$ under the two functions are equal up to null sets because of $H^{-1}(B_n) = A_n$. Now notice that the set $[g \neq g']$ is a subset of $\bigcup_{n, B, B': B, B' \in \beta_n, B \neq B'} (g^{-1}B \cap g'^{-1}B')$ where the terms in the latter are null sets because of the previous sentence. We have shown $g = g'$ a.e..

(On measure preserving) It is easy to show that H is measure preserving from the fact that $g = g'$ a.e.. Alternatively, it also follows from $H^{-1}(B_n) = A_n$.

## 4. Uniqueness of H

If H and H’ satisfy (2) , then $H = H'$ holds $\mu$-a.e.. This follows from the following two observations:

• $H \circ f = H' \circ f$ holds $\mathbb P$-a.e.
• $[H = H']$ is measurable (because $(Y, \mathcal B)$ is a standard Borel space).

## 5. applications

Exercise 25. Let f, g be measurable functions from a probability space $(\Omega, \mathcal F, \mathbb P)$ to measurable spaces $(X, \mathcal A)$ and $(Y, \mathcal B)$ respectively. Show that $\sigma(f) = \sigma(g) \bmod \mathbb P$ if and only if there are $X_0 \in \mathcal A$ with $\mu(X_0) = 1$ and $Y_0 \in \mathcal B$ with $\nu(Y_0) = 1$ and a bi-measurable bijection $H: X_0 \to Y_0$ such that $g = H \circ f$ a.e..

Exercise 30. Given two measure-theoretic dynamical systems $(X, \mathcal A, \mu, T)$ and $(Y, \mathcal B, \nu, S)$ (measure preserving transformations on Lebesgue spaces), show that the latter is a factor of the former if and only if there is a joining $\lambda$ (a $T\times S$-invariant probability measure on $(X \times Y)$) such that $\mathcal B \subset \mathcal A$ mod $\lambda$ (with abuse of notation).