Understanding Cell Phone Technology: Noise

Today we’re going to talk about why noise on a communications channel is bad. Wait a minute, you say, I thought anything that increases uncertainty, including noise, increases information, and that’s a good thing. Who’s right?

The nuance here is that we want the uncertainty we choose, not the uncertainty introduced by noise. The mathematician Warren Weaver first explained this idea when he said that the word “information” means a measure “of freedom of choice and hence [the] uncertainty as to what choice has been made. It is therefore possible for the word information to have either good or bad connotations. Uncertainty which arises by virtue of freedom of choice on the parts of the sender is desirable uncertainty. Uncertainty which arises because of errors or because of the influence of noise is undesirable uncertainty.”

So how do we describe this mathematically? The set X of transmittable symbols has an entropy H(X); each member of the set has an associated probability p(x). Noise on the communications channel can affect the entropy and probabilities associated with the set Y of receivable symbols. The probabilities and entropies of the sets X and Y are related. For example, we can calculate the probability of a specific y occurring given that a specific x was sent, p(y|x). We can also define a conditional entropy H(Y|X):

H(Y|X)  = -∑x  p(x) ∑y p(y|x) log p(y|x)

where the symbol ∑x means take the sum over all values of x.

This brings us to the mutual information I(X;Y), which “measures the average uncertainty in the message [sent] when the [received] signal is known”:

I(X;Y)  = H(Y) –  H(Y|X)

Then the channel capacity C of a (noisy) channel is given by taking the maximum of I(X;Y) over all choices of p(x):

C = max I(X;Y)

These ideas all come from Claude Shannon’s 1948 communications theory paper, which rigorously proves that this C is the limit for how quickly information be reliably transmitted over a channel. Today, channel capacity drives all communications system designs, including those of our cell phones and their networks.

Let’s work through two examples. Consider first a binary channel without noise, where a 0 is transmitted with probability p and a 1 is transmitted with a probability q = 1-p.

Probability of x = p(x) x Probability of y given x = p(y|x) y          Probability of y = p(y)


p                                  010p


q = 1-p 1 0 0 p

q = 1-p1111-p


With these values, we can calculate the mutual information I(X;Y):

I(X;Y)   = ∑y (-p(y) log p(y)) + ∑x  p(x)∑y p(y|x) log p(y|x)

= -p log p – (1-p) log (1-p) – – p log 1 – – (p-1) log 1 = -p log p – (1-p) log (1-p)

The channel capacity C is the maximum value of I(X;Y) achieved over all choices of p(x). Not surprisingly, the maximum is achieved for p = q = 1/2, which achieves the maximum entropy in the transmitted symbols. If you know a little calculus, you can derive this result by taking the first derivative with respect to p of the above expression.

Next consider a binary symmetric channel with noise, with the values:


p(x)                  x                      p(y|x) y p(y)


r                       0   1-p 0 q

r 0 p 1 q-1

1-r 1 p 0 q

1-r 1 1-p 1 q-1


First we’re going to find the mutual information, which we write as:

I(X;Y)   = ∑y (-p(y) log p(y)) + ∑x  p(x) ∑y p(y|x) log p(y|x)

= -q log q – (1-q) log (1-q) + r[(1-p) log (1-p) + p log p] + (1-r)[p log p + (1-p) log (1-p)]

= -q log q – (1-q) log (1-q) + p log p + (1-p) log (1-p)

Since p(y) = p(x)p(y|x), it follows that q = r(1-p) + (1-r)p = r – 2rp + p. This allows us to reduce the three unknowns p, q, r down to two unknowns, just p and r. In fact, p is just a measure of the noise in the channel being used to transmit our information, so we have little control over it. Now we can rewrite I(X;Y) as

-(r – 2rp + p) log (r – 2rp + p) – [1- (r – 2rp + p)] log [1- (r – 2rp + p)] + p log p + (1-p) log (1-p)

We want to maximize this expression in terms of r in order to find the maximum channel capacity. With a little calculus, we again find that the maximum is achieved when the r = ½. Plugging in r = ½, we find that channel capacity depends on the noise measure p:

C = -½ log ½ – ½ log ½ + p log p + (1-p) log (1-p) = 1 – H(p)

This blog contains a lot of complicated ideas, so here’s the bottom line take away:

1.     A transmitted symbol X has a certain entropy H(X).

2.     The corresponding received symbol Y has a certain entropy H(Y).

3.     These entropies are related, and can be used to define the mutual information I(X;Y) which “measures the average uncertainty in the message [sent] when the [received] signal is known”.

4.     By maximizing I(X;Y), we can find the channel capacity C, “ the maximum rate (in bits per second) at which useful information (i.e., total uncertainty minus noise uncertainty) can be transmitted over the channel.”

Next week, we’re going to start talking about how you design (data compression and error correction) codes for your digital source so you can maximize channel transmission rates.