Notes on Chapter 7 (Differentiation) of Walter Rudin's Real and Complex Analysis. I tried to actually section this out properly this time! Long chapter!

Preliminaries

Symmetric Derivative

Fix a dimension kk. Let μ\mu be a complex Borel measure. Define the quotients (Qrμ)(x)=μ(B(x,r))m(B(x,r))(Q_r\mu)(x) = \frac{\mu(B(x,r))}{m(B(x,r))} and define the symmetric derivative of μ\mu at xx by (Dμ)(x)=limr0 (Qrμ)(x)(D\mu)(x)=\lim_{r\rightarrow 0}\ (Q_r\mu)(x) where this limit exists. For μ0\mu\geq 0, define the maximal function MμM\mu: (Mμ)(x)=sup0<r< (Qrμ)(x)(M\mu)(x) = \sup_{0<r<\infty}\ (Q_r\mu)(x) and for an arbitrary complex Borel measure, the maximal function will be defined to be that of its total variation. This function will be used to study the symmetric derivative. This function is lower semicontinuous.

We can show that the maximal function of a measure cannot be large on a large set:

"Tail bound" for maximal functions of measures: If μ\mu is a complex Borel measure on Rk\mathbb{R}^k and λ\lambda is a positive number, then m{Mμ>λ}3kλ1μ.m\{M\mu > \lambda\}\leq 3^k\lambda^{-1}\|\mu\|.

Here μ=μ(Rk)\|\mu\|=|\mu|(\mathbb{R}^k) and the LHS abbreviates m({xRk:(Mμ)(x)>λ})m(\{x\in \mathbb{R}^k:(M\mu)(x)>\lambda\}).

This is shown by constructing a suitable covering of any compact subset of the open set {Mμ>λ}\{M\mu>\lambda\}.

Weak L1L^1

If fL1(Rk)f\in L^1(\mathbb{R}^k) and λ>0\lambda > 0, then m{f>λ}λ1f1m\{|f|>\lambda\} \leq \lambda^{-1}\|f\|_1 because putting E={f>λ}E=\{|f|>\lambda\} we have λm(E)Ef dmRkf dm=f1.\lambda m(E) \leq \int_E|f|\ dm \leq \int_{\mathbb{R}^k} |f|\ dm = \|f\|_1.

We thus say that any measurable function ff for which λm{f>λ}\lambda m\{|f|>\lambda\} is a bounded function of λ\lambda on (0,)(0, \infty) is said to belong to weak L1L^1. (1/x1/x on (0,1)(0, 1) is in weak L1L^1 but not L1L^1.)

Analogously to measures, for each fL1(Rk)f\in L^1(\mathbb{R}^k) we define its maximal function Mf:Rk[0,]Mf: \mathbb{R}^k\rightarrow [0,\infty] by setting (Mf)(x)=sup0<r<1m(Br)B(x,r)f dm.(Mf)(x)=\sup_{0<r<\infty}\frac{1}{m(B_r)}\int_{B(x, r)} |f|\ dm.

In particular, setting dμ=f dmd\mu = f\ dm we see that this definition agrees with the previous on measures. Hence the tail bound on maximal functions of measures says that the "maximal operator" MM sends L1L^1 to weak L1L^1, with the following tail bound:

"Tail bound" for maximal functions: For every fL1(Rk)f\in L^1(\mathbb{R}^k) and every λ>0\lambda > 0, m{Mf>λ}3kλ1f1.m\{Mf>\lambda\}\leq 3^k\lambda^{-1}\|f\|_1.

Lebesgue points

If fL1(Rk)f\in L^1(\mathbb{R}^k), any xRkx\in \mathbb{R}^k for which it is true that limr01m(Br)B(x,r)f(y)f(x) dm(y)=0\lim_{r\rightarrow 0} \frac{1}{m(B_r)} \int_{B(x, r)} |f(y) - f(x)|\ dm(y) = 0 is called a Lebesgue point of ff.

This condition can be interpreted as the points of ff where ff does not oscillate 'too much' on average. Surprisingly, we have:

Lebesgue points are a.e.: If fL1(Rk)f\in L^1(\mathbb{R}^k), then almost every xRkx\in \mathbb{R}^k is a Lebesgue point of ff.

The crucial step is to approximate fL1(Rk)f\in L^1(\mathbb{R}^k) by gC(Rk)g\in C(\mathbb{R}^k) with fg1\|f - g\|_1 sufficiently small. If a function is a continuous at a point, it is a Lebesgue point, so the remainder of the proof uses the tail bound (discussed earlier) of fgf-g to bound the limit and shows that it goes to zero a.e.

This theorem gives us several interesting consequences, which we will quickly discuss in the following subsections (mostly without proof).

Nicely shrinking sets

A sequence of {Ei}\{E_i\} of Borel sets in Rk\mathbb{R}^k is said to shrink to xRkx\in \mathbb{R}^k nicely if there is a number α>0\alpha>0 with the following property: There is a sequence of balls B(x,ri)B(x, r_i) with limri=0\lim r_i = 0 such that EiB(x,ri)E_i\subset B(x, r_i) and m(Ei)αm(B(x,ri))m(E_i)\geq \alpha m(B(x, r_i)) for all ii.

Note that it is not required that xEix\in E_i or even in its closure. This condition says that the sets EiE_i must occupy a 'substantial' (α\alpha-fraction) of the volume of some ball around xx containing it.

Differentiation using nicely shrinking sets: Associate to each xRkx\in \mathbb{R}^k a sequence \{E^i(x)\} that shrinks to xx nicely, and let fL1(Rk)f\in L^1(\mathbb{R}^k). Then f(x)=limi1m(Ei(x))Ei(x)f dmf(x)=\lim_{i\rightarrow\infty}\frac{1}{m(E_i(x))}\int_{E_i(x)}f\ dm at every Lebesgue point of ff, hence a.e. [m][m].

Fundamental Theorem of Calculus (easy part): If fL1(R1)f\in L^1(\mathbb{R}^1) and F(x)=xf dm(<x<),F(x)=\int_{-\infty}^x f\ dm\qquad (-\infty < x < \infty), then F(x)=f(x)F'(x) = f(x) at every Lebesgue point of ff, hence a.e. [m][m].

The latter is a corollary of the former theorem.

Metric Density

Let EE be a Lebesgue measurable subset of Rk\mathbb{R}^k. The metric density of EE at a point xRkx\in \mathbb{R}^k is defined to be limr0m(EB(x,r))m(B(x,r))\lim_{r\rightarrow 0}\frac{m(E\cap B(x,r))}{m(B(x,r))} provided that the limit exists.

Letting ff be the characteristic function of EE and applying the nicely shrinking sets theorem, we see that the metric density of EE is 1 a.e. on EE and 0 a.e. outside it. In particular, if ε>0\varepsilon > 0, there is no set ERE\subset \mathbb{R} satisfying ε<m(EI)m(I)<1ε\varepsilon < \frac{m(E\cap I)}{m(I)} < 1-\varepsilon for every segment II.

Differentiation of measures

Using the Lebesgue points theorem, we can show the following quickly:

Differentiation of absolutely continuous complex Borel measures: Suppose μ\mu is a complex Borel measure on Rk\mathbb{R}^k, and μm\mu\ll m. Then Dμ=fD\mu = f a.e. [m][m], and μ(E)=E(Dμ) dm\mu(E) = \int_E (D\mu)\ dm for all Borel sets ERkE\subset \mathbb{R}^k.

This means the Radon-Nikodym derivative can also be obtained as a limit of the quotients in this case.

In the singular case, we have the following:

Differentiation of singular complex Borel measures: Associate to each xRkx\in \mathbb{R}^k a sequence {Ei(x)}\{E_i(x)\} that shrinks to xx nicely. If μ\mu is a complex Borel measure and μ  m\mu\ \bot\ m, then limiμ(Ei(x))m(Ei(x))=0\lim_{i\rightarrow\infty}\frac{\mu(E_i(x))}{m(E_i(x))} = 0 a.e. [m][m].

The Jordan decomposition shows that it suffices to prove this for μ>0\mu > 0. Using the nicely shrinking property, we can show that this a consequence of the special case (Dμ)(x)=0(D\mu)(x) = 0 a.e. [m][m]. This is in turn proved by considering instead the upper derivative (Dˉμ)(x)(\bar{D}\mu)(x) defined by (Dˉμ)(x)=limn[sup0<r<1/n(Qrμ)(x)].(\bar{D}\mu)(x)=\lim_{n\rightarrow\infty}\left[\sup_{0<r<1/n}(Q_r\mu)(x)\right].

Choose λ,ε>0\lambda, \varepsilon > 0. Using the singularity of μ\mu, μ\mu is concentrated on a set of Lebesgue measure 0. As μ\mu is regular (see the second corollary to Regularity of measure on σ\sigma-compact spaces), we can pick a compact KK with m(K)=0m(K)=0 and μ(K)>με\mu(K) > \|\mu\|-\varepsilon. The tail bound m{Dˉμ>λ}<3kλ1εm\{\bar{D}\mu > \lambda\} < 3^k\lambda^{-1}\varepsilon arises by showing that outside of KK, (Dˉμ)(x)(M(μμ1))(x)(\bar{D}\mu)(x) \leq (M(\mu-\mu_1))(x) (MM being the maximum operator). The theorem follows.

Combining, we get the following:

Differentiation of complex Borel measures: Associate to each xRkx\in \mathbb{R}^k a sequence {Ei(x)}\{E_i(x)\} that shrinks to xx nicely. Letμ\mu is a complex Borel measure on Rk\mathbb{R}^k. Let dμ=f dm+dμsd\mu = f\ dm + d\mu_s be the Lebesgue decomposition of μ\mu w.r.t mm. Then limiμ(Ei(x))m(Ei(x))=f(x)\lim_{i\rightarrow\infty}\frac{\mu(E_i(x))}{m(E_i(x))} = f(x) a.e. [m][m]. In particular μ  m\mu\ \bot\ m iff (Dμ)(x)=0(D\mu)(x)=0 a.e. [m][m].

In contrast, we remark that if we consider positive Borel measures, we get something quite different:

Differentiation of positive Borel measures: If μ\mu is a positive Borel measure on Rk\mathbb{R}^k and μ  m\mu\ \bot\ m, then (Dμ)(x)=(D\mu)(x)=\infty a.e. [μ][\mu].

Note the "a.e." is taken relative to μ\mu here, not mm. In particular, this makes sense for the zero measure because then any measurable set is also μ\mu-almost all of Rk\mathbb{R}^k.

The Fundamental Theorem of Calculus

We have proven the easy part of the FTC above. The other (harder) part of the FTC states the following: f(x)f(a)=axf(t) dt(axb)f(x) - f(a) = \int_a^x f'(t)\ dt \qquad (a\leq x \leq b) when ff is differentiable everywhere and ff' is continuous everywhere.

When extending the FTC to the Lebesgue setting, questions of if the requirements of continuity and differentiability can be relaxed/adjusted arise. We cover two interesting ways where it can fail:

  • Set f(x)=x2sin(x2)f(x)=x^2\sin(x^{-2}) if x0x\neq 0, and f(0)=0f(0)=0. Then ff is differentiable at every point but 01f(t) dt=,\int_0^1 |f'(t)|\ dt = \infty, so fL1f'\notin L^1. However if we interpret the FTC integral (with [0,1][0,1] in place of [a,b][a,b]) as the limit of integrals over [ε,1][\varepsilon, 1], then the FTC still holds for this ff.
  • Q: Suppose ff is continuous on [a,b][a,b], ff is differentiable at almost every point of [a,b][a,b] and fL1f'\in L^1 on [a,b][a,b]. Does this imply FTC?
    A: No. This is demonstrated by a continuous monotonic nondecreasing Cantor function that increases from 0 to 1 on an interval but has constant derivative almost everywhere on the interval.

As the statement is what is mostly of interest, we state two possible generalizations of the FTC quickly and only broadly sketch the proof strategy. But before that:

A complex function ff on an interval I=[a,b]I=[a,b] is absolutely continuous on II (ff is AC on II) if for each ε\varepsilon there is a δ\delta s.t. i=1nf(βi)f(αi)<ε\sum_{i=1}^n |f(\beta_i)-f(\alpha_i)| < \varepsilon for any nn and any disjoint collection of segments (α1,β1),,(αn,βn)(\alpha_1, \beta_1),\cdots,(\alpha_n, \beta_n) in II whose lengths satisfy i=1n(βiαi)<δ.\sum_{i=1}^n (\beta_i-\alpha_i) < \delta.

We can now state the first generalization, which allows for some a.e. differentiable functions to be integrated.

FTC for AC functions: If ff is a complex function that is AC on I=[a,b]I=[a,b], then ff is differentiable a.e. on II, fL1(m)f'\in L^1(m), and f(x)f(a)=axf(t) dt(axb).f(x)-f(a) = \int_a^x f'(t)\ dt\qquad (a\leq x\leq b).

This is proven by first showing it for nondecreasing AC functions, then showing that every AC function ff can be expressed as an average of two nondecreasing AC functions F+fF + f and FfF - f, where FF is also an AC function called the total variation of ff (and is defined in a way similar to that of measures). This gives the statement.

In the process, note that we also show that ff maps sets of measure 0 to sets of measure 0.

The second generalization requires differentiability everywhere but not continuity for ff':

FTC for fL1f'\in L^1: If f:[a,b]Rf:[a,b]\rightarrow \mathbb{R} is differentiable at every point of [a,b][a,b] and fL1f'\in L^1 on [a,b][a,b], then f(x)f(a)=axf(t) dt(axb).f(x)-f(a)=\int_a^x f'(t)\ dt\qquad (a\leq x\leq b).

This uses the Vitali-Carathéodory Theorem to approximate ff' with a lower semicontinuous function gg and uses it to prove the theorem with some manipulations.

Differentiable Transformations

In the following, VV is an open set in Rk\mathbb{R}^k, TT maps VV into Rk\mathbb{R}^k, and A:RkRkA: \mathbb{R}^k\rightarrow\mathbb{R}^k is a linear operator.

If there exists a linear operator (matrix) AA on Rk\mathbb{R}^k such that limh0T(x+h)T(x)Ahh=0\lim_{h\rightarrow 0}\frac{|T(x+h)-T(x)-Ah|}{|h|} = 0 then we say that TT is differentiable at xx, T(x)=A=JT(x)T'(x) = A = J_T(x) is the derivative, and its determinant is the Jacobian.

In an earlier chapter Rudin showed that (omitted from notes) for every linear AA there is a number Δ(A)=detA\Delta(A) = |\det A| s.t. m(A(E))=Δ(A)m(E)m(A(E)) = \Delta(A)m(E). Hence, in the general case, we would like to say that m(T(E))m(E)Δ(T(x))=JT(x).\frac{m(T(E))}{m(E)} \sim \Delta(T'(x)) = |J_T(x)|.

This is the content of the following:

Jacobian scaling factor: If T is continuous, and differentiable at some point xVx\in V, then limr0m(T(B(x,r)))m(B(x,r))=Δ(T(x)).\lim_{r\rightarrow 0} \frac{m(T(B(x,r)))}{m(B(x,r))}=\Delta(T'(x)).

As it is classically, this shows that the Jacobian represents how the measure of a tiny set changes under the transform. In Rudin this is proven by splitting into cases. If the derivative AA is one-to-one, define F(x)=A1T(x)F(x) = A^{-1}T(x). It suffices to show that limr0m(F(B(x,r)))m(B(x,r))=1,\lim_{r\rightarrow 0} \frac{m(F(B(x,r)))}{m(B(x,r))}=1, as m(T(B))=m(A(F(B)))=Δ(A)m(F(B))m(T(B))=m(A(F(B)))=\Delta(A)m(F(B)).

Then, the statement follows in this case from showing for any ε>0\varepsilon > 0 there is a δ>0\delta > 0 such that the sandwich inclusions B(0,(1ε))F(B(0,r))B(0,(1+ε)r)B(0,(1-\varepsilon)) \subset F(B(0,r)) \subset B(0,(1+\varepsilon)r) hold for any 0<r<δ0<r<\delta. This uses the Brouwer fixpoint theorem.

In the other case, Rk\mathbb{R}^k is mapped into a set of measure 0. Fix ε>0\varepsilon > 0. We then use the fact that AxAx approximates T(x)T(x) near 0 to contain T(B(0,r))T(B(0, r)) in a 'thickening' of A(B(0,r))A(B(0, r)) that has measure εrk\varepsilon r^k. Then the desired limit goes to zero.

A short lemma is stated:

Lemma (7.25): Suppose ERkE\subset \mathbb{R}^k, m(E)=0m(E)=0, TT maps EE into Rk\mathbb{R}^k, and lim supT(y)T(x)yx<\limsup\frac{|T(y)-T(x)|}{|y-x|}<\infty for every xEx\in E as yxy\rightarrow x in EE. Then m(T(E))=0m(T(E))=0.

Corollary: TT maps sets of measure 0 to sets of measure 0.

Finally, we arrive at the change of variables theorem:

Change of variables for integration: Suppose that

  • XVRkX\subset V\subset \mathbb{R}^k, VV is open, T:VRkT:V\rightarrow\mathbb{R}^k is continuous;
  • XX is Lebesgue measurable, TT is one-to-one on XX, TT is differentiable at every point of XX;
  • m(T(VX))=0m(T(V-X)) = 0.

Then we have T(X)f dm=X(fT)JT dm\int_{T(X)}f\ dm = \int_X (f\circ T)|J_T|\ dm for every measurable f:Rk[0,]f: \mathbb{R}^k\rightarrow [0,\infty].

The proof proceeds in three steps (denoting the Lebesgue measurable subsets of Rk\mathbb{R}^k by M\mathfrak{M}:

  • If EME\in\mathfrak{M} and EVE\subset V, then T(E)MT(E)\in\mathfrak{M}.

Every Lebesgue-measurable set is a union of an FσF_\sigma an a set of Lebesgue measure zero, so this is proven for the cases individually and combined.

  • For every EME\in\mathfrak{M}, m(T(EX))=XχEJT dm.m(T(E\cap X)) = \int_X\chi_E |J_T|\ dm.

Let nn be a positive integer and put Vn={xV:T(x)<n},Xn=XVn.V_n = \{x\in V: |T(x)|<n\},\qquad X_n=X\cap V_n.

The first step lets us define μ(E)=m(T(EXn))\mu(E) = m(T(E\cap X_n)) and show it is a measure on M\mathfrak{M}. We show that the conclusion holds on each of these parts EXnE\cap X_n and apply monotone convergence as XnXX_n\rightarrow X.

  • For every AMA\in\mathfrak{M}, T(X)χA dm=X(χAT)JT dm.\int_{T(X)}\chi_A\ dm = \int_X (\chi_A\circ T)|J_T|\ dm.

This is proven for Borel sets and Lebesgue measure 0 sets separately and combined, as every Lebesgue measurable set is a disjoint union of a Borel set and a set of measure 0. A subtlety here is that we cannot directly use the previous step by setting E=T1(A)E=T^{-1}(A') (the preimage) where AA' is a Lebesgue-measurable set, and instead do it for Borel sets only. This is because the continuous preimage of a Lebesgue measurable set is not necessarily Lebesgue measurable, but that is true for a Borel set.

Finally, it is now clear the theorem holds for every nonnegative simple function, ff, so the monotone convergence theorem gives the result. And of course, the usual single-variable change of variables follows from this.