Monday, November 5, 2012

Jensen's Inequality


[This post might not be rendered properly in Safari ]

Jensen's inequality finds widespread application in mathematical proofs. However, in spite of its popularity, an intuitive explanation of it, that I have come across a few times, doesn't seem to be as popular. The purpose of this post is to present this mostly-visual explanation in brief.

I am not sure when this argument originated, or by whom, but Google turns up this paper* - even if this is not the original argument it is a good reference.

This post doesn't come anywhere close to a comprehensive discussion of the inequality. It's deliberately sketchy wrt a lot of mathematical details - in fact, I formally state the inequality only near the end of this post. It attempts to provide a feel of the inequality, and no more.

Center of mass

If you know what a CM is, you should skip ahead to the next section.

For our purposes, we can think of the center of mass of an object as that one point where you could hold it up from. In Fig 1, the finger is placed at the center of mass of the toy bird. Fig 2 shows a baseball bat held up at  the center of mass of the bat.

Fig 1 (Source)


                                                 
Fig 2 (Source)













We wouldn't be looking at the CMs of bodies like the toy bird or a baseball bat, but of groups of disconnected masses, like in Fig 3; here, a mass is represented by a black circle, with a larger mass represented by a larger circle. If the disconnected-ness of the masses seems confusing, you could assume that the masses are affixed to a Plexiglas sheet of negligible weight, shown in gray, as in Fig 4.
Fig 3
Fig 4














Now, look at Figures 5,6 and 7 - or just Fig 5 if you're in a hurry :) . The green dot approximately marks the CM of the masses. The red dots are placed at points where the CM cannot be. The following diagrams are intended to exercise your intuition of CM - make sure that where you think the CM should be is close to the green dot. In other words, you could hold up the mass-Plexiglas arrangement -much like the toy bird from Fig 1 - by placing it on a finger at the green dot. Also, verify that the red dots look like extremely unlikely locations of the CM to you i.e. if you held up the system at a red dot you would have excess weight on one side and the system would topple.
Fig 5
                  
Fig 6
Fig 7













A red dot outside of the mass-Plexiglas system challenges you to think that the CM doesn't lie within the system at all.

Let's briefly walk through the example in Fig 5. The red dot at the center could have been the CM if the masses were equal; since they are not, it's slightly shifted toward the larger mass. The red dots at the edge of the Plexiglas and on the medium-sized mass can't clearly work: there is simply too much weight on one side and so if you held up the arrangement here it definitely would topple over. The red dot outside can't possibly work either - all mass is on one side of it; there is no way you could keep the whole thing in balance at this point (assuming there was a way to hold up the whole thing at this point - maybe via an extension of the Plexiglas sheet).

This should be the basic takeaway from the exercise:
  1. The CM occurs inside the area enclosed by the masses i.e. inside the shaded region. 
  2. It's closer to relatively larger masses. 

Before we move on, we need to look at expression for the coordinates of the CM - lets refer to it as \((x_{cm}, y_{cm})\) - in terms of the various masses and their locations. We need to do this so that we can write down the final expression in its mathematical form. We do not discuss how we arrive at this equation though.

If our masses are \(m_1, m_2, ..., m_n\) and they are placed at coordinates \((x_1, y_1), (x_2, y_2), ... (x_n, y_n)\) respectively, then:
$$x_{cm} = \frac{\sum_{i=1}^{i=n} m_i x_i}{\sum_{i=1}^{i=n}m_i }$$
$$y_{cm} = \frac{\sum_{i=1}^{i=n} m_i y_i}{\sum_{i=1}^{i=n}m_i }$$
Essentially, the coordinates are the weighted average of the coordinates of the masses.



Jensen's inequality
Fig 8























Take a look at the curve of the function f(x) in Fig. 8. This is a concave function because were you to stand on the x-axis and look up at it, you would see a concave curve (a convex function maybe described in an analogous manner). Assume that masses are placed at points A,B,C,D and these weigh \(m_1\),\(m_2\), \(m_3\) and \(m_4\) respectively. Where is the CM of the system? - quite likely somewhere near P; at any rate, somewhere in the shaded region which, importantly, is bounded by the function - f(x) - from above. The coordinates of the CM -\((x_{cm}, y_{cm})\) - are marked in the figure.
Now, what can we say about the value \(f(x_{cm})\)? It is on the curve f(x), as shown by point Q. But, since the curve f(x) bounds the shaded region from above, it is obvious that \(f(x_{cm}) > y_{cm}\): obvious since this is as good as saying Q is higher than P. This is the heart of the inequality - this is pretty much what Jensen's inequality states.

Thus, in our interpretation, Jensen's inequality makes explicit a simple relationship between the y-coordinate of the CM and the value of a function at its x-coordinate.

Beyond this point, all steps merely take us to a standard representation.



Standard form

So far, we have  \(f(x_{cm}) > y_{cm}\).

Here are the coordinates of the CM:
$$x_{cm} = \frac{m_1 x_1 +  m_2 x_2 + m_3 x_3  +  m_4 x_4}{m_1 + m_2 + m_3 + m_4}$$
$$y_{cm} = \frac{m_1 y_1 +  m_2 y_2 +  m_3 y_3 +  m_4 y_4}{m_1 + m_2 + m_3 + m_4}$$
Substituting the values for \(x_{cm}\) and \(y_{cm}\) we obtain:
$$f(\frac{m_1 x_1 +  m_2 x_2 + m_3 x_3  +  m_4 x_4}{m_1 + m_2 + m_3 + m_4}) >  \frac{m_1 y_1 +  m_2 y_2 +  m_3 y_3 +  m_4 y_4}{m_1 + m_2 + m_3 + m_4}$$
Next, we replace all \(y_i\) with the corresponding \(f(x_i)\) in the RHS - this gives us the inequality for our case:
$$f(\frac{m_1 x_1 +  m_2 x_2 + m_3 x_3  +  m_4 x_4}{m_1 + m_2 + m_3 + m_4}) >  \frac{m_1 f(x_1) +  m_2 f(x_2) +  m_3 f(x_3) +  m_4 f(x_4)}{m_1 + m_2 + m_3 + m_4}$$
In the general case, for a concave curve, this is what the inequality looks like:
$$f(\frac{\sum_{i=1}^{i=n}m_i x_i}{\sum_{i=1}^{i=n} m_i}) \geq \frac{\sum_{i=1}^{i=n} m_i f(x_i)}{\sum_{i=1}^{i=n} m_i}$$where \(\forall m_i, m_i \geq 0\)

Often, we are interested in cases where \(\sum_{i=1}^{i=n}m_i=1\). The inequality, then, simplifies to:
$$f(\sum_{i=1}^{i=n}m_i x_i) \geq \sum_{i=1}^{i=n} m_i f(x_i)$$where \(\forall m_i, m_i \geq 0\) and \(\sum_{i=1}^{i=n}m_i = 1\)

Similar arguments can be made for convex curves. We end up with the "\(\geq\)" symbol being reversed in the inequality.

Done. Really.



* The author of the paper, Tristan Needham, has the favorably reviewed book "Visual Complex Analysis" to his credit. As with other books on my To-Read list I hope to get to it someday.