The Auffinger-Chen Representation
Equivalent representation results contribute not only a connection between different concepts, but also a new set of proof techniques. Indeed, stochastic analysis has offered a number of alternative proofs to many problems. Occasionally the proof can simplify drastically. In this post, we will discuss a particularly elegant application by Auffinger and Chen (2015), for an otherwise very difficult problem in spin glass.
The Problem Statement
For the sake of writing a self-contained blog post, we will not attempt to provide a description of spin glass models. Instead, we will state the problem in the most mathematically interesting form, without explaining where the quantities came from.
Let be twice differentiable and strictly increasing and strictly convex (i.e. ), also let be a cumulative distribution function (CDF). We will consider the Parisi partial differential equation (PDE) defined as follows
where the time derivative is defined by the right limit for when is discontinuous.
It is well known that we can solve this PDE backwards in time using a Hopf-Cole transformation; in fact, we will provide a sketch in a a later section. This allows us to state an optimization objective as follows:
where we are minimizing over the set of all CDFs on for each . Finally we can state the question as follows:
Question Does there exist a unique minimizer to the optimization problem for each ?
The main difficulty comes from the unclear dependence on , even if we can write down a closed form solution to the Parisi PDE. At the very least, it would be extremely unpleasant and tedious to work with. Additionally, we remark that the problem is already stated in a simplified form, as opposed to the original framing in spin-glass.
Before we jump into the main results, we observe that existence of a minimizer is straight forward to prove. Since we are restricted to the domain , any sequence of probability measures is tight. It is then sufficient to consider any sequence of probability measures that minimizes , and tightness implies there exist a converging subsequence such that weakly, which is a minimizer of .
The Auffinger-Chen Representation
To complete the proof, it is sufficient to show is strictly convex in . In this section, we will use a stochastic representation to show convexity, which is the main difficulty of the problem. Readers unfamiliar with stochastic analysis can find a brief introduction in a previous blog post, in particular we will use Itô’s Lemma in the upcoming proofs.
We start by defining , where is a standard Brownian motion. Let be ’s canonical filtration, and then we define a collection of processes
For simplicity of notation, we will write for this section. At this point we will state the main result.
Theorem (Auffinger-Chen Representation) For all a probability distribution on , we have the following
In particular, we have the maximizer is unique, and is given by , where is the strong solution of the following stochastic differential equation (SDE)
Remark Before we begin the proof, we will observe that ’s convexity follows directly from this representation. Firstly both integral terms containing are linear in . Since is convex in , we have the term is convex in . Next the expectation over the sum of two convex functions remain convex. Finally, a maximum (or supremum) over convex functions remain convex, proving the desired convexity result!
Before we start, we will state several technical (but not difficult to prove) Lemmas. To guarantee a strong solution of the SDE, it is sufficient to have be Lipschitz in . We will omit the proof of these results as they are not important to the main goal of this blog post. Instead we will state the following Lemma containing the desired estimates.
Lemma (Derivative Estimates) For all probability distributions on , we have that
Another important result we will omit is the continuity of in .
Lemma (Lipschitz in ) For any discrete distributions , and for all , we have that
Since we can approximate any distributions in by discrete distributions, then we can extend the definition of and to all distributions by continuity. Therefore it is sufficient to prove the result for only finitely supported distributions.
proof (of the Auffinger-Chen representation): The proof will be a straight forward application of Itô’s Lemma, and the results follow almost directly from invoking the Parisi PDE.
We start with discrete , i.e is a piecewise constant function. Let , and define
and let . Then we observe that
appears exactly inside the first term of the Auffinger-Chen representation.
At this point we adopt concise notation and write , and apply Itô’s Lemma to to get
Here we note while the time derivative does not exist at finitely many points, we will eventually only use it in integral form. Using the Parisi PDE at points of continuity, we can make the following substitution
We will make the substitution and complete the square to get
Next we write this equation as an integral over , and taking expectation to remove the martingale term we get
Since are continuous in , we can extend this equation to all . Furthermore, since the second integral is always positive, we must have the inequality
and the inequality must be strict unless almost surely.
Observe this proves the inequality of the representation. Since , we have , hence achieving the equality in the representation.
Sketch of Strict Convexity
At this point, the author believes the goal of the blog post is already achieved: we have demonstrated the key technique with only very basic manipulations. That being said, to complete the story, we will provide a short sketch on how to prove strict convexity - hence proving there is a unique minimizer of .
We once again start with a key technical lemma.
Lemma (Strict Convexity in ) For all a probability distribution on , and for all , we have
Here we remind the reader that strict convexity in does not directly imply strict convexity in . We could just take this result for granted, but there is a nice proof using the Hopf-Cole transform and another stochastic representation, so why not?
sketch (of Lemma): Since is continuous in , we will only consider a discrete . Then using an appropriate time change and time reversal, we can get a new PDE
with initial conditions (as opposed to terminal conditions) , and changed due to time reversal. To simplify the PDE, we use the Hopf-Cole transformation to substitute , which leads to the simplified linear PDE
with initial conditions . Using another time change, we can also remove the above.
where is a standard Brownian motion, and is constant in . At this point it is sufficient to show strict convexity for this , since we can piece together the constant intervals later. To this end, we will write
where we observe since and , we have that defines a new probability measure. In particular, we have Jensen’s inequality.
With this we can take the second derivative of to get
where we used Jensen’s inequality and the fact that .
Finally we return to strict convexity in .
sketch (of Strict Convexity in ): We will start by introducing quantities related to convexity. Let , and let for some . Recalling , and using the optimal , where defined with respect to . Note this is not necessarily optimal for .
Since is convex, we can write
where each is defined as
Since is strictly convex, the inequality is strict unless
almost surely. Using the Auffinger-Chen representation, we have that . Therefore to prove the convexity is strict, it is sufficient to prove a gap in the first inequality, which is equivalent to saying that
has positive variance. The variance can be computed as
While we omit the technical details, it’s not hard to believe satisfy the following SDE (from Itô’s Lemma and differentiating the Parisi PDE)
Observing that is a martingale with independent increments, we can compute as
where the last step followed from Itô’s Isometry. Defining , we can also write . With a bit of algebra we can derive
Since , the desired result follows from the fact .
Recall the original problem of . We have shown that while the dependence structure is unclear, we are able to prove its convexity with it easily using a stochastic representation. The author would like to point out that most techniques used here are quite basic, which is surprising for an originally very difficult problem.
The author would also like to point to a more general variational stochastic representation by Boué and Dupuis (1998), perhaps more useful for other applications.
Finally the post would not be possible without attending an excellent graduate course on spin glass taught by Dmitry Panchenko, where he has done a much better job explaining this topic. In particular, Dmitry has written an excellent book (Panchenko, 2013) with a bonus chapter covering this topic that can be found online. I would also highly recommends Dmitry’s notes on probability theory, which has been in general very helpful to the author’s studies and research.
- Auffinger, A., & Chen, W. K. (2015). The Parisi formula has a unique minimizer. Communications in Mathematical Physics, 335(3), 1429-1444.
- Boué, M., & Dupuis, P. (1998). A variational representation for certain functionals of Brownian motion. The Annals of Probability, 26(4), 1641-1659.
- Panchenko, D. (2013). The Sherrington-Kirkpatrick model. Springer Science & Business Media.