Stone-Weierstrass and an Alternative Proof of Itô’s Lemma

15 minute read


In a similar sense to line integrals, stochastic calculus extends the classical tools to working with stochastic processes. One of the most elegant and useful result is the change of variable formula for stochastic integrals, commonly known as Itô’s Lemma (see end of this post for a discussion on Doeblin’s contribution). While this lemma is quite easy to use, the proof usually relies heavily on technical lemmas, hence difficult to develop intuition, especially for the first time reader.

With this motivation in mind, it was quite pleasant to discover a set of excellent lecture notes by Jason Miller (2016), which contained an alternative proof built on the idea of Stone-Weierstrass Theorem. We shall see that not only do we have a more interpretable proof, the technique is also generalizable beyond stochastic calculus. In particular, this blog post intends to illustrate the technique in detail through Itô’s Lemma.

A Brief Background on Stochastic Calculus

We will introduce (without too much rigour) some basic definitions and results to support the proofs in later sections. The reader need not to carefully analyze the technical details here to understand the proofs to come. Readers familiar with stochastic calculus may skip to the next section.

First we let \((\Omega, \mathcal{F}, \{\mathcal{F}_t\}_{t\geq 0}, \mathbb{P})\) be a probability space equipped with a filtration (also satisfying the usual conditions to be rigorous). With this we can define several useful objects.

Definition A stochastic process \(X := \{X_t\}_{t\geq 0}\) is said to be a martingale if

(i) \(\forall t \geq 0\), we have \(X_t\) is measurable with respect to \(\mathcal{F}_t\), denoted \(X_t \in \mathcal{F}_t\);

(ii) \(\forall 0 \leq s \leq t\), we have \(\mathbb{E}[ X_t | \mathcal{F}_s ] = X_s\) a.s.

Definition We say a random variable \(\tau:\Omega \to [0,\infty]\) is a stopping time if \(\forall t \geq 0, \{\tau \leq t \} \in \mathcal{F}_t\).

An important property of stopping time is that if \(X_t\) is a martingale and \(\tau\) a stopping time, then \(X_{t \wedge \tau}\) is also a martingale.

Definition Let the interval \([0,T]\) be partitioned using increments of \(2^{-n}\), i.e. \(\{t_k^n\}_{k=0}^{\lceil T 2^n \rceil}\), where \(t_k^n = k 2^{-n} \wedge T\). Let \(X_t\) be a continuous martingale, and \(f_t\) be a continuous (possibly stochastic) process. We define the Itô integral as

\[ \int_0^T f_t \, dX_t := \lim_{n\to\infty} \sum_{k=0}^{\lfloor T 2^n \rfloor} f_{t_k^n} (X_{t_{k+1}^n} - X_{t_k^n}), \]

if the limit converges u.c.p. (uniformly on compact intervals in probability to be precise).

Remark Observe the above definition uses a left Riemann sum to define the integral, where as other choices will lead to different integrals. This is opposed to deterministic integrals, where the all choices are equivalent.

Definition Consider the same partition \(\{t_k^n\}\) as above. Let \(M,N\) be two continuous martingales, we define the quadratic covariation as

\[ [M,N]_T := \lim_{n\to\infty} [M,N]^n_T := \lim_{n\to\infty} \sum_{k=0}^{\lfloor T 2^n \rfloor} (M_{t_{k+1}^n} - M_{t_k^n}) (N_{t_{k+1}^n} - N_{t_k^n}), \]

where the limit is also u.c.p. We also define the quadratic variation as \([M]_T := [M,M]_T\).

Several useful results are stated next.

Proposition (Finite Variation) Let \(X,Y\) be continuous stochastic processes such that \(X\) has finite variation, i.e.

\[\lim_{n\to\infty} \sum_{k=0}^{\lfloor T 2^n \rfloor} | X_{t_{k+1}^n} - X_{t_k^n} | < \infty,\]

and \([Y]_t > 0\) a.s. Then we have

\[[X,Y]_t = 0 \;\text{a.s.}\]

Proposition (Itô’s Product Rule) Let \(X,Y\) be continuous martingales, then we have

\[X_t Y_t - X_0 Y_0 = \int_0^t X_s dY_s + \int_0^t Y_s dX_s + [X,Y]_t \,.\]

Proposition (Fundamental Theorem) Let \(X,Y,Z\) be continuous martingales, then we have

\[\int_0^t X_s d\left( \int_0^s Y_u dZ_u \right) = \int_0^t X_s Y_s dZ_s.\]

Proposition (Kunita-Watanabe Identity) Let \(X,Y,Z\) be continuous martingales, then we have

\[\left[ \int_0 X_s dY_s, Z \right]_t = \int_0^t X_s d[Y,Z]_s,\]

where both uses of \([\;,\;]_t\) denotes the covariation.

Proposition (Itô’s Isometry)

Let \(M\) be a continuous martingale, and \(H\) be a continuous stochastic process. Then we have

\[\mathbb{E} \left[ \left( \int_0^t H_s dM_s \right)^2 \right] = \mathbb{E} \int_0^t H_s^2 d[M]_s.\]

The Lemma and the Classical Approach

For the purpose of the blog post, we will only state and prove a much simpler version of the lemma, but it is not difficult to adapt to more general conditions.

Theorem (Itô’s Lemma) Let \(X_t\) be a continuous martingale, and \(f \in C^2(\mathbb{R})\). Then we have

\[ f(X_t) = f(X_0) + \int_0^t \frac{\partial f}{\partial x}(X_s) dX_s + \frac{1}{2} \int_0^t \frac{\partial^2 f}{\partial x^2} (X_s) d[X]_s. \]

Here we will sketch the proof from Karatzas and Shreve (1991).

proof sketch: We start by defining a stopping time \(\tau_r := \inf \{t \geq 0 : |X_t| + [X]_t > r\}\), and replace \(X_t\) with \(X_{t \wedge \tau_r}\). This localization technique will allow us to only consider the function \(f\) in the interval \(B_r := [-r, r]\) (or a ball in higher dimensions), which has bounded derivatives.

By observing the lemma’s statement, the reader may notice the formula appears like the second order Taylor expansion of \(f(X_t)\). Indeed we can write

\[\begin{align*} f(X_t) - f(X_0) =& \lim_{n\to\infty} \sum_{k=0}^{\lfloor t 2^n \rfloor} f(X_{t_{k+1}^n}) - f(X_{t_{k}^n}) \\ =& \lim_{n\to\infty} \sum_{k=0}^{\lfloor t 2^n \rfloor} \Big\{ \frac{\partial f}{\partial x}(X_{t_{k}^n}) [X_{t_{k+1}^n} - X_{t_{k}^n}] \\ &+ \frac{1}{2} \frac{\partial^2 f}{\partial x^2} (\eta_k^n) [X_{t_{k+1}^n} - X_{t_{k}^n}]^2 \Big\}, \end{align*}\]

where \(\eta_k^n \in [X_{t_{k}^n}, X_{t_{k+1}^n}]\) is chosen as part of Taylor’s theorem to satisfy the above equality. It’s not difficult to see the first sum converges to the first stochastic integral, then it remains to show the second term converges.

To this goal, we will define

\[\begin{align*} J_1^n &:= \sum_{k=0}^{\lfloor t 2^n \rfloor} \frac{\partial^2 f}{\partial x^2} (\eta_k^n) [X_{t_{k+1}^n} - X_{t_{k}^n}]^2, \\ J_2^n &:= \sum_{k=0}^{\lfloor t 2^n \rfloor} \frac{\partial^2 f}{\partial x^2} (X_{t_{k}^n}) [X_{t_{k+1}^n} - X_{t_{k}^n}]^2, \\ J_3^n &:= \sum_{k=0}^{\lfloor t 2^n \rfloor} \frac{\partial^2 f}{\partial x^2} (X_{t_{k}^n}) \{ [X]_{t_{k+1}^n} - [X]_{t_{k}^n} \}, \end{align*}\]

where observe \(J_3^n\) converges to the desired integral. Next we will use the following technical inequality. Let \(|X_s| \leq K < \infty, \forall s \leq T\) be a martingale, then we have

\[\mathbb{E} ([X]^n_T)^2 \leq 6 K^4.\]

Without stating the details, using this and Cauchy-Schwarz inequality, we can show

\[\lim_{n\to\infty} |J_1^n - J_2^n| = 0 \; \text{a.s.}\]

To complete the proof, we will need one more technical lemma. Let \(|X_s| \leq K < \infty, \forall s \leq T\), then we have

\[\lim_{n\to\infty} \mathbb{E} \sum_{k=0}^{\lfloor t 2^n \rfloor} [ X_{t_{k+1}^n} - X_{t_k^n} ]^4 = 0.\]

Then once again omitting the details, we can get

\[\mathbb{E} |J_2^n - J_3^n| \leq 2 \sup_{x \in B_r} \left| \frac{\partial^2 f}{\partial x^2}(x) \right|^2 \mathbb{E} \left[ \sum_{k=0}^{\lfloor t 2^n \rfloor} [ X_{t_{k+1}^n} - X_{t_k^n} ]^4 + [X]_t \max_{k} ( [X]_{t_{k+1}^n} - [X]_{t_{k}^n} ) \right],\]

which combined with the previous lemma and bounded convergence theorem, we get the desired result

\[\lim_{n\to\infty} |J_2^n - J_3^n| = 0 \; \text{a.s.}\]

Putting everything together gives us the desired formula as stated.

Remark The use of the propositions listed in the previous section is implicit in the two technical lemmas we stated above, where we also hide most of the proof difficulty in.

Interpretation This proof naturally leads to an interpretation that Itô’s Lemma as a consequence of Taylor’s expansion. However this proof provides no clear intuition on why the second order approximation is the correct order, and pushes the justification to complicated technical details. Probably the most troubling consequence is that a different integration scheme (e.g. Stratonovich which rises from a mid-point Riemann sum) leads to a different change of variable formula, therefore the Taylor expansion intuition can lead to further confusion.

Overview of the Alternative Approach

At this point, we will first take a step back from Itô’s Lemma and look at a rough sketch of the proof technique.

Suppose we want to prove a collection of functions (e.g. \(C^2([a,b])\)) satisfy a certain property \((P)\), we will start by defining \(\mathcal{A}\) as the subset of \(C^2([a,b])\) that satisfies the desired property \((P)\).

(Step 1) We will identify a certain algebraic structure such that \(\mathcal{A}\) is closed under, e.g. for an algebra (over a field) we have if \(f,g \in \mathcal{A}\), then \(cf + g, fg \in \mathcal{A}\). In other words, an algebra is a vector space with an associative vector multiplication.

(Step 2) Then we can say that the collection \(\mathcal{A}\) (or a dense subset) is generated by some very simple functions, e.g. under an algebra, the functions \(\{1, x\}\) generate the entire collection of polynomials.

(Step 3) At this point, we use a density argument such as Weierstrass approximation to show \(\mathcal{A}\) is dense in \(C^2([a,b])\). Specifically, \(\forall f \in C^2([a,b])\), \(\exists \{f_n\}_{n \geq 1} \subset \mathcal{A}\) such that \(f_n \to f\) with respect to some metric \(\rho\).

(Step 4) Finally, it is sufficient to show \(\mathcal{A}\) is closed under this metric \(\rho\). I.e. if \(\{f_n\}_{n \geq 1}\) all satisfy \((P)\) are such that \(f_n \to f\) in \(\rho\), then we have \(f\) also satisfies \((P)\), hence \(f \in \mathcal{A}\).

Remark The reader may already recognize that the sketch above was intentionally phrased in a very general sense, so we can observe the flexibility of the technique. In fact we can even generalize beyond function spaces, as long as we have an equivalent approximation technique.

The Proof in Detail

We start by stating the key theorem.

Theorem (Stone-Weierstrass, Real Numbers) Let \(S\) be a compact Hausdorff space, and \(\mathcal{A} \subset C(S, \mathbb{R})\) an algebra which contains a non-zero constant function. Then \(\mathcal{A}\) is dense in \(C(S, \mathbb{R})\) if and only if it separates points.

Clearly, if we let \(S = B_r\), we have a compact Hausdorff space, and the collections of polynomials contains the functions \(\{1,x\}\) and separates points. Therefore we have \(\mathcal{A}\) is dense in \(C(B_r, \mathbb{R}), \forall r > 0\) with respect to the sup-norm.

Applying the same theorem to the derivatives, we then have the same result for \(C^2(B_r, \mathbb{R})\) with respect to a similar norm

\[\| f \|_{B_r} := \sup_{x \in B_r, \, m = 0,1,2} \left| \frac{\partial^m f}{\partial x^m} (x) \right|.\]

proof (of Itô’s Lemma): We will similarly use a localization argument, i.e. define \(\tau_r := \inf \{t \geq 0 : |X_t| + [X]_t > r \}\), and replace \(X_t\) with \(X_{t \wedge \tau_r}\).

(Step 1, 2) Let \(\mathcal{A} \subset C^2(\mathbb{R})\) be the collection of functions where Itô’s Lemma is satisfied. Trivially we have that \(\{1,x\}\) are in \(\mathcal{A}\), and \(\mathcal{A}\) forms a vector space.

Next we show that \(\mathcal{A}\) forms an algebra. In particular, suppose \(f,g \in \mathcal{A}\), and define \(F_t := f(X_t), G_t := g(X_t)\). Using the product rule gives us

\[F_t G_t - F_0 G_0 = \int_0^t F_s dG_s + \int_0^t G_s dF_s + [F,G]_t \,.\]

Using the Fundamental Theorem and Itô’s Lemma on \(g\), we get

\[\int_0^t F_s dG_s = \int_0^t f(X_s) \frac{\partial g}{\partial x}(X_s) dX_s + \frac{1}{2} \int_0^t f(X_s) \frac{\partial^2 g}{\partial x^2}(X_s) d[X]_s \,.\]

and observe the same is true switching the order of \(F,G\). Next we use Itô’s Lemma and expand with the Kunita-Watanabe identity to get

\[[F,G]_t = \int_0^t \frac{\partial f}{\partial x}(X_s) \frac{\partial g}{\partial x}(X_s) d[X]_s \, ,\]

where the extra terms are zero because the covariation with one finite variation process is zero, i.e. \([ \,[X]\, ,Y ]_t = 0\) as \([X]_t\) has finite variation. By grouping the integrals by the integrators (e.g. \(d[X]_t\)), we get that \(fg\) satisfies Itô’s Lemma or simply \(fg \in \mathcal{A}\).

(Step 3) Here we can apply the Stone-Weierstrass Theorem to get that \(\mathcal{A}\) is dense in \(C^2(B_r)\) with respect to the norm \(\|\cdot\|_{B_r}\).

(Step 4) It remains to show that \(\mathcal{A}\) is closed with respect to \(\|\cdot\|_{B_r}\). In particular, let \((f_n)_{n \geq 1}\) be a sequence in \(\mathcal{A}\) such that \(f_n \to f\) in \(\|\cdot\|_{B_r}\). Then we have

\[\int_0^t \left| \frac{\partial^2 f_n}{\partial x^2}(X_s) - \frac{\partial^2 f}{\partial x^2}(X_s) \right| d[X]_s \leq \|f_n - f\|_{B_r} [X]_t \, .\]

At the same time, we also have by Itô’s Isometry

\[\begin{align*} \mathbb{E} \left( \int_0^t \frac{\partial f_n}{\partial x}(X_s) - \frac{\partial f}{\partial x}(X_s) dX_s \right)^2 &= \mathbb{E} \int_0^t \left(\frac{\partial f_n}{\partial x}(X_s) - \frac{\partial f}{\partial x}(X_s) \right)^2 d[X]_s \\ &\leq \|f_n - f\|_{B_r} [X]_t \, . \end{align*}\]

Since the process is localized we have that \([M]_t \leq r\), and therefore we can pass the limit in the Itô formula and get

\[\begin{align*} f(X_t) - f(X_0) &= \lim_{n\to\infty} f_n(X_t) - f_n(X_0) \\ &= \lim_{n\to\infty} \int_0^t \frac{\partial f_n}{\partial x}(X_s) dX_s + \frac{1}{2} \int_0^t \frac{\partial^2 f_n}{\partial x^2}(X_s) d[X]_s \\ &= \int_0^t \frac{\partial f}{\partial x}(X_s) dX_s + \frac{1}{2} \int_0^t \frac{\partial^2 f}{\partial x^2}(X_s) d[X]_s \,. \end{align*}\]

Finally, since Itô’s Lemma hold for all \(r>0\), we can simply take \(r\to\infty\) to complete the proof.


Remark Clearly the alternative proof is not necessarily easier, however let us observe a couple of advantages.

Firstly, none of the steps above were very complicated, as most steps followed directly from useful (and well known) propositions. Notably, a first time reader of this subject will have a much easier time following the steps and seeing the bigger picture, rather than getting trapped by technical details.

Secondly, we now have an additional interpretation of the second integral in the formula, which clearly arises as a consequence of Itô’s product rule and Kunita-Watanabe identity. For the readers that have not seen the proof, it follows almost directly from the definition, i.e. a direct consequence of choosing the left Riemann sum.


We have shown the Stone-Weierstrass Theorem is not only a strong result on its own, but leads to a powerful technique in general. In particular, we saw a nice alternative proof of Itô’s Lemma with much better interpretations. Ideally, the author would have liked to add another example, but the post is already quite long at this point. Hopefully the readers will still have enjoyed an interesting blog post, and added another proof technique in their arsenal.

Please comment below (new feature!) for any questions or feedback!

An Interesting Story to Wrap Up

For the longest time, the lemma was credited to Kiyosi Itô alone in his 1950 paper. This was until the 1990s with a resurgence of interests in the late French-German mathematician Wolfgang Doeblin, who was well known to be quite gifted. The interests led to a demand to open the remaining “pli cacheté” (sealed envelope) held by the French Academy of Sciences, which he submitted just before he passed away in 1940 - he burned his notes and took his own life so the German soldiers cannot take advantage of his work. To everyone’s surprise, Doeblin’s letter contained significant research progress ahead of his time, including a statement of the same change of variables formula! To honour his contribution, the result is sometimes referred to as the Itô-Doeblin Lemma.

For the interested readers, I would strongly recommend an excellent commentary by Bernard Bru and Marc Yor (2002) for further details on this topic.


  • Bru, B. & Yor, M. (2002). Comments on the life and mathematical legacy of Wolfgang Doeblin.. Finance and Stochastics, 6, 3-47.
  • Karatzas, I. & Shreve, S.E. (1991). Brownian Motion and Stochastic Calculus. Springer New York
  • Miller, J. (2016). Stochastic Calculus, Lent 2016 Lecture Notes. Retrieved from