Connected by Poincaré Inequality
Published:
While studying two seemingly irrelevant subjects, probability theory and partial differential equations (PDEs), I ran into a somewhat surprising overlap: the Poincaré inequality. On one hand, it is not out of the ordinary for analysis based subjects to share inequalities such as Cauchy-Schwarz and Hölder; on the other hand, the two forms of Poincaré inequality have quite different applications.
In this blog post, I hope to put together some excellent content I studied recently, specifically from:
- Concentration Inequalities by Boucheron, Lugosi, and Massart (2013)
- Partial Differential Equations by Evans (2010)
A Simple Inequality
We first state a very simple version of the inequality:
Theorem (A Simple Poincaré Inequality) Let Ω⊂Rn be open and bounded, and let f∈C1c(Ω) (differentiable with compact support). Then there exists a constant C that depends only on Ω such that:
‖f‖L2(Ω)≤C‖∇f‖L2(Ω)
Quick aside: we say a function f has compact support if the set S={x∈Ω:f(x)≠0} has compact closure. This implies f(x)=0 near the boundary.
Observe that the inequality simply bounds the L2-norm of a function in terms of the L2-norm of its gradient instead. Note the compact support here is an important assumption when we are integrating with respect to the Lebesgue measure. Consider for example a constant function, then this inequality would fail as the gradient is zero. The reader may be comforted that a general form will require much fewer assumptions, and can be generalized to all Lp norms.
The reason we start with this inequality is because the proof is quite straightforward:
proof (of the Simple Poincaré Inequality):
Without loss of generality, we let Ω⊂[0,M]n for some large M>0, and by the Cauchy-Schwarz inequality we have
|f(x)|2≤|∫x10∂∂xif(y1,x2,…)dy1|2≤[∫M012dy1][∫M0|∂f∂x1|2dy1]
Summing over all n possible derivatives, and integrating over Ω we have
n∫Ω|f(x)|2≤∫Ωn∑i=1M∫M0|∂f∂xi|2dyi=n∑i=1M‖∂f∂xi‖2L2(Ω)
where in the last step we exchanged the order of integration, and used the fact that the L2 norm is a constant. Rewriting the above we get the desired result
‖f‖L2(Ω)≤M√n‖∇f‖L2(Ω)
As a Concentration Inequality
We now state the inequality in a form most useful for probability theory, see Theorem 3.20 from Boucheron, Lugosi, Massart (2013):
Theorem (Gaussian-Poincaré Inequality) Let X=(X1,…,Xn) be a vector of i.i.d. standard Gaussian random variables. Let f:Rn→R be any continuously differentiable function. Then
Var[f(X)]≤E[|∇f(X)|2]
Observe that the inequality is slightly different. Firstly this time the norm is centered, although centering in this case is not an issue since Var[X]≤EX2. Secondly due to the measure being a probability measure, we have a much smaller constant on the inequality C=1. In combination, we were also able to drop the compact support assumption.
An immediate consequence is to consider f Lipschitz with coefficient 1, i.e. |f(x)−f(y)|≤‖x−y‖, then we have
Var[f(X)]≤1
In other words, we just found a constant bound on the variance for a huge class of random functions! In general, we can consider f to be a smooth estimator based on a dataset with noise X. The Poincaré inequality will provide a very useful bound on estimation error.
To prove this inequality, we will use a famous result from 1981 (Theorem 3.1 in Boucheron, Lugosi, Massart (2013)):
Theorem (Efron-Stein Inequality) Let X=(X1,…,Xn) be a vector of i.i.d. random variables and let Z=f(X) be a square-integrable function of X. Then
Var(Z)≤n∑i=1E[(Z−E(i)Z)2]
where E(i)Z=∫f(X1,…,Xi−1,xi,Xi+1,…)dμi(xi), i.e. the expectation over Xi only.
The Efron-Stein inequality can be proved by decomposing the variance as a sum of telescoping differences of conditional expectations, and applying Jensen’s inequality to the individual terms. While we omit the proof here, we should remark that the simple Efron-Stein inequality has wide ranging applications; we will only look at one such use for the proof of the Poincaré inequality, taken from Theorem 3.20 in Boucheron, Lugosi, Massart (2013):
proof (of Gaussian-Poincaré Inequality):
First we observe that a direct application of the Efron-Stein inequality can reduce the problem down to n=1, i.e. it is sufficient to show
E(i)[(Z−E(i)Z)2]≤E(i)∂f∂xi(X)2
From here we assume without loss of generality n=1. Then we notice that it is sufficient to prove this inequality for compactly supported, twice differentiable functions, i.e. f∈C2c(R), since otherwise we can just take a limit to the original function.
Here we let ϵ1,…,ϵn be i.i.d. Rademacher random variables, i.e. P[ϵj=1]=P[ϵj=−1]=12∀j∈{1,2,…,n}, and we define
Sn=n−1/2n∑j=1ϵj
Observe that for every i we have
Var(i)[f(Sn)]=14[f(Sn+1−ϵi√n)−f(Sn+1+ϵi√n)]2
Applying the Efron-Stein inequality, we get
Var[f(Sn)]≤14n∑i=1E[(f(Sn+1−ϵi√n)−f(Sn+1+ϵi√n))2]
Let K=supx|f″(x)|, then we have that
|f(Sn+1−ϵi√n)−f(Sn+1+ϵi√n)|≤2√n|f′(Sn)|+2Kn
which implies
n4(f(Sn+1−ϵi√n)−f(Sn+1+ϵi√n))2≤f′(Sn)2+2K√n|f′(Sn)|+K2n
Finally the central limit theorem then imply the desired result
lim supn→∞14n∑i=1E[(f(Sn+1−ϵi√n)−f(Sn+1+ϵi√n))2]=E[f′(X)2]
Remark There are also Poincaré type inequalities for non-Gaussian random variables, for example if X∼Poisson(μ):
Var[f(X)]≤μE[(f(X+1)−f(X))2]
Or if X is double exponential i.e. with density 12e−|x|, then we have:
Var[f(X)]≤4E[(f′(X))2]
An Application to PDEs
To do proper justice for the theory of PDEs, we will need a significant background in functional analysis. In this section, we will try to side-step the technical details and focus on one single application, that is showing the existence and uniqueness of a weak solution for Poisson’s equation:
−Δu=f in Ω u=g in ∂Ω
where Ω⊂Rn is open bounded with smooth boundaries ∂Ω. By weak solution, we meant there exists a u∈C1(Ω) such that ∀v∈C1c(Ω) we have
B[u,v]:=∫Ω∇u⋅∇v=∫Ωfv
Note if u is a solution to the (original) Poisson’s equation, then we have the above weak equation by Green’s identity. The main tool we will use to prove existence and uniqueness is the following result:
Theorem (Lax-Milgram) Let H be a Hilbert space, B:H2→R be a continuous, coersive, bilinear form. Then ∀φ∈H∗, there exists a unique u∈H such that
B(u,v)=<φ,v>∀v∈H
where <φ,v> is the linear functional φ applied to v.
Remark Before going into the definitions and technical details, we observe that Lax-Milgram Theorem gives us exactly what we want - the existence and uniqueness! Now we just have to fill in the blanks:
- define a Hilbert space H where the solution lives in
- show that B(u,v) is continuous and coersive (we will define later)
- let <φ,v>=∫Ωfv
Step 1 To define our Hilbert space, we will consider the following inner product:
(u,v):=∫Ω[uv+∇u⋅∇v]
which corresponds to the following Sobolev norm:
‖u‖H1(Ω):=(u,u)1/2=[‖u‖2L2(Ω)+‖∇u‖2L2(Ω)]1/2
By equipping the space C1c(Ω) with the above inner product, we almost have a Hilbert space! Here we will simply take the completion of C1c(Ω) with respect to the Sobolev norm, i.e. add all the limit points to the space. We call this (completed) Hilbert space H10(Ω).
Step 2 We now turn our attention to B(u,v), the bilinear form (fancy term for separately linear in both inputs). Then we say B is continuous if
∃C1>0:∀u,v∈H,|B(u,v)|≤C1‖u‖H1(Ω)‖v‖H1(Ω)
Note this is an immediate consequence of Cauchy-Schwarz inequality
|B(u,v)|≤‖∇u‖L2(Ω)‖∇v‖L2(Ω)≤‖u‖H1(Ω)‖v‖H1(Ω)
We say B is coersive if
∃C2>0:∀u∈H,B(u,u)≥C2‖u‖2H1(Ω)
We notice this is the only non-trivial condition left to check, and to prove this we will finally use Poincaré inequality! Start by rewriting
B(u,u)=∫Ω∇u⋅∇u=‖∇u‖2L2(Ω)
Applying the Poincaré inequality on half of the norm we have
12‖∇u‖2L2(Ω)≥12C‖u‖2L2(Ω)
Therefore
B(u,u)≥12‖∇u‖2L2(Ω)+12C‖u‖2L2(Ω)≥min(12,12C)‖u‖2H1(Ω)
And voilà, we have existence and uniqueness! A rigorous and careful reader may notice that u does not necessarily have compact support - this is correct. However every u∈H10(Ω) is a limit of compactly supported functions, therefore we just need to take a limit to get our result!
Remark In fact, we can use similar Lax-Milgram based methods to show existence and uniqueness for a large subset of elliptical PDEs. We should note that the fact we can “convert” between ‖u‖ and ‖∇u‖ is highly useful for studying Sobolev norms. We refer curious readers to Evans (2010) for an excellent chapter on Sobolev spaces and related inequalities.
Final Words
I have a weak spot for connections between different fields, probably because it’s always surprising, and surprises are intriguing in math! I hope to have to presented a readable introduction to the inequality and its applications in both topics, without drowning readers in technical details. On this note, I should remark that to study Sobolev spaces rigorously, the reader will need to go through all the details carefully!
As this is my first blog post, any constructive feedback or suggestions on future topics will be appreciated!
References
- Boucheron, S., Lugosi, G., & Massart, P. (2013). Concentration inequalities: A nonasymptotic theory of independence. Oxford university press.
- Evans, L. C. (2010). Partial differential equations. Providence, R.I.: American Mathematical Society.