Friday, 17 June 2011

A maximum sentence of Stein's continuous

The formatter threw an exception while trying to deserialize the message: Error in deserializing body of request message for operation 'Translate'. The maximum string content length quota (30720) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 2, position 33333.
The formatter threw an exception while trying to deserialize the message: Error in deserializing body of request message for operation 'Translate'. The maximum string content length quota (30720) has been exceeded while reading XML data. This quota may be increased by changing the MaxStringContentLength property on the XmlDictionaryReaderQuotas object used when creating the XML reader. Line 1, position 34089.

If {f: {\bf R}^d \rightarrow {\bf C}} is a locally integrable function, we define the Hardy-Littlewood maximal function {Mf: {\bf R}^d \rightarrow {\bf C}} by the formula

\displaystyle Mf(x) := \sup_{r>0} \frac{1}{|B(x,r)|} \int_{B(x,r)} |f(y)|\ dy,

where {B(x,r)} is the ball of radius {r} centred at {x}, and {|E|} denotes the measure of a set {E}. The Hardy-Littlewood maximal inequality asserts that \displaystyle |\{ x \in {\bf R}^d: Mf(x) > \lambda \}| \leq \frac{C_d}{\lambda} \|f\|_{L^1({\bf R}^d)} \ \ \ \ \ (1)

for all {f\in L^1({\bf R}^d)}, all {\lambda > 0}, and some constant {C_d > 0} depending only on {d}. By a standard density argument, this implies in particular that we have the Lebesgue differentiation theorem \displaystyle \lim_{r \rightarrow 0} \frac{1}{|B(x,r)|} \int_{B(x,r)} f(y)\ dy = f(x)

for all {f \in L^1({\bf R}^d)} and almost every {x \in {\bf R}^d}. See for instance my lecture notes on this topic.

By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz interpolation theorem (and the trivial inequality {\|Mf\|_{L^\infty({\bf R}^d)} \leq \|f\|_{L^\infty({\bf R}^d)}}) we see that \displaystyle \|Mf\|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (2)

for all {p > 1} and {f \in L^p({\bf R}^d)}, and some constant {C_{d,p}} depending on {d} and {p}.

The exact dependence of {C_{d,p}} on {d} and {p} is still not completely understood. The standard Vitali-type covering argument used to establish (1) has an exponential dependence on dimension, giving a constant of the form {C_d = C^d} for some absolute constant {C>1}. Inserting this into the Marcinkiewicz theorem, one obtains a constant {C_{d,p}} of the form {C_{d,p} = \frac{C^d}{p-1}} for some {C>1} (and taking {p} bounded away from infinity, for simplicity). The dependence on {p} is about right, but the dependence on {d} should not be exponential.

In 1982, Stein gave an elegant argument (with full details appearing in a subsequent paper of Stein and Strömberg), based on the Calderón-Zygmund method of rotations, to eliminate the dependence of {d}:

Theorem 1 One can take {C_{d,p} = C_p} for each {p>1}, where {C_p} depends only on {p}.

The argument is based on an earlier bound of Stein from 1976 on the spherical maximal function

\displaystyle M_S f(x) := \sup_{r>0} A_r |f|(x)

where {A_r} are the spherical averaging operators \displaystyle A_r f(x) := \int_{S^{d-1}} f(x+r\omega) d\sigma^{d-1}(\omega)

and {d\sigma^{d-1}} is normalised surface measure on the sphere {S^{d-1}}. Because this is an uncountable supremum, and the averaging operators {A_r} do not have good continuity properties in {r}, it is not a priori obvious that {M_S f} is even a measurable function for, say, locally integrable {f}; but we can avoid this technical issue, at least initially, by restricting attention to continuous functions {f}. The Stein maximal theorem for the spherical maximal function then asserts that if {d \geq 3} and {p > \frac{d}{d-1}}, then we have \displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (3)

for all (continuous) {f \in L^p({\bf R}^d)}. We will sketch a proof of this theorem below the fold. (Among other things, one can use this bound to show the pointwise convergence {\lim_{r \rightarrow 0} A_r f(x) = f(x)} of the spherical averages for any {f \in L^p({\bf R}^d)} when {d \geq 3} and {p > \frac{d}{d-1}}, although we will not focus on this application here.)

The condition {p > \frac{d}{d-1}} can be seen to be necessary as follows. Take {f} to be any fixed bump function. A brief calculation then shows that {M_S f(x)} decays like {|x|^{1-d}} as {|x| \rightarrow \infty}, and hence {M_S f} does not lie in {L^p({\bf R}^d)} unless {p > \frac{d}{d-1}}. By taking {f} to be a rescaled bump function supported on a small ball, one can show that the condition {p > \frac{d}{d-1}} is necessary even if we replace {{\bf R}^d} with a compact region (and similarly restrict the radius parameter {r} to be bounded). The condition {d \geq 3} however is not quite necessary; the result is also true when {d=2}, but this turned out to be a more difficult result, obtained first by Bourgain, with a simplified proof (based on the local smoothing properties of the wave equation) later given by Muckenhaupt-Seeger-Sogge.

The Hardy-Littlewood maximal operator {Mf}, which involves averaging over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies the pointwise inequality

\displaystyle Mf(x) \leq M_S f(x)

for any (continuous) {f}, which intuitively reflects the fact that one can think of a ball as an average of spheres. Thus, we see that the spherical maximal inequality (3) implies the Hardy-Littlewood maximal inequality (2) with the same constant {C_{p,d}}. (This implication is initially only valid for continuous functions, but one can then extend the inequality (2) to the rest of {L^p({\bf R}^d)} by a standard limiting argument.)

At first glance, this observation does not immediately establish Theorem 1 for two reasons. Firstly, Stein’s spherical maximal theorem is restricted to the case when {d \geq 3} and {p > \frac{d}{d-1}}; and secondly, the constant {C_{d,p}} in that theorem still depends on dimension {d}. The first objection can be easily disposed of, for if {p>1}, then the hypotheses {d \geq 3} and {p > \frac{d}{d-1}} will automatically be satisfied for {d} sufficiently large (depending on {p}); note that the case when {d} is bounded (with a bound depending on {p}) is already handled by the classical maximal inequality (2).

We still have to deal with the second objection, namely that constant {C_{d,p}} in (3) depends on {d}. However, here we can use the method of rotations to show that the constants {C_{p,d}} can be taken to be non-increasing (and hence bounded) in {d}. The idea is to view high-dimensional spheres as an average of rotated low-dimensional spheres. We illustrate this with a demonstration that {C_{d+1,p} \leq C_{d,p}}, in the sense that any bound of the form \displaystyle \| M_S f \|_{L^p({\bf R}^d)} \leq A \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (4)

for the {d}-dimensional spherical maximal function, implies the same bound \displaystyle \| M_S f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (5)

for the {d+1}-dimensional spherical maximal function, with exactly the same constant {A}. For any direction {\omega_0 \in S^d \subset {\bf R}^{d+1}}, consider the averaging operators

\displaystyle M_S^{\omega_0} f(x) := \sup_{r>0} A_r^{\omega_0} |f|(x)

for any continuous {f: {\bf R}^{d+1} \rightarrow {\bf C}}, where \displaystyle A_r^{\omega_0} f(x) := \int_{S^{d-1}} f( x + r U_{\omega_0} \omega)\ d\sigma^{d-1}(\omega)

where {U_{\omega_0}} is some orthogonal transformation mapping the sphere {S^{d-1}} to the sphere {S^{d-1,\omega_0} := \{ \omega \in S^d: \omega \perp \omega_0\}}; the exact choice of orthogonal transformation {U_{\omega_0}} is irrelevant due to the rotation-invariance of surface measure {d\sigma^{d-1}} on the sphere {S^{d-1}}. A simple application of Fubini’s theorem (after first rotating {\omega_0} to be, say, the standard unit vector {e_d}) using (4) then shows that \displaystyle \| M_S^{\omega_0} f \|_{L^p({\bf R}^{d+1})} \leq A \|f\|_{L^p({\bf R}^{d+1})} \ \ \ \ \ (6)

uniformly in {\omega_0}. On the other hand, by viewing the {d}-dimensional sphere {S^d} as an average of the spheres {S^{d-1,\omega_0}}, we have the identity \displaystyle A_r f(x) = \int_{S^d} A_r^{\omega_0} f(x)\ d\sigma^d(\omega_0);

indeed, one can deduce this from the uniqueness of Haar measure by noting that both the left-hand side and right-hand side are invariant means of {f} on the sphere {\{ y \in {\bf R}^{d+1}: |y-x|=r\}}. This implies that \displaystyle M_S f(x) \leq \int_{S^d} M_S^{\omega_0} f(x)\ d\sigma^d(\omega_0)

and thus by Minkowski’s inequality for integrals, we may deduce (5) from (6).

Remark 1 Unfortunately, the method of rotations does not work to show that the constant {C_d} for the weak {(1,1)} inequality (1) is independent of dimension, as the weak {L^1} quasinorm {\| \|_{L^{1,\infty}}} is not a genuine norm and does not obey the Minkowski inequality for integrals. Indeed, the question of whether {C_d} in (1) can be taken to be independent of dimension remains open. The best known positive result is due to Stein and Strömberg, who showed that one can take {C_d = Cd} for some absolute constant {C}, by comparing the Hardy-Littlewood maximal function with the heat kernel maximal function \displaystyle \sup_{t > 0} e^{t\Delta} |f|(x).

The abstract semigroup maximal inequality of Dunford and Schwartz (discussed for instance in these lecture notes of mine) shows that the heat kernel maximal function is of weak-type {(1,1)} with a constant of {1}, and this can be used, together with a comparison argument, to give the Stein-Strömberg bound. In the converse direction, it is a recent result of Aldaz that if one replaces the balls {B(x,r)} with cubes, then the weak {(1,1)} constant {C_d} must go to infinity as {d \rightarrow \infty}.

— 1. Proof of spherical maximal inequality

We now sketch the proof of Stein’s spherical maximal inequality (3) for {d \geq 3}, {p > \frac{d}{d-1}}, and {f \in L^p({\bf R}^d)} continuous. To motivate the argument, let us first establish the simpler estimate

\displaystyle \| M_S^1 f \|_{L^p({\bf R}^d)} \leq C_{d,p} \|f\|_{L^p({\bf R}^d)}

where {M_S^1} is the spherical maximal function restricted to unit scales: \displaystyle M_S^1 f(x) := \sup_{1 \leq r \leq 2} A_r |f|(x).

For the rest of these notes, we suppress the dependence of constants on {d} and {p}, using {X \lesssim Y} as short-hand for {X \leq C_{p,d} Y}.

It will of course suffice to establish the estimate \displaystyle \| \sup_{1 \leq r \leq 2} |A_r f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (7)

for all continuous {f \in L^p({\bf R}^d)}, as the original claim follows by replacing {f} with {|f|}. Also, since the bound is trivially true for {p=\infty}, and we crucially have

We establish this bound using a Littlewood-Paley decomposition

\displaystyle f = \sum_N P_N f

where {N} ranges over dyadic numbers {2^k}, {k \in {\bf Z}}, and {P_N} is a smooth Fourier projection to frequencies {|\xi| \sim N}; a bit more formally, we have \displaystyle \widehat{P_N f}(\xi) = \psi(\frac{\xi}{N}) \hat f(\xi)

where {\psi} is a bump function supported on the annulus {\{ \xi \in {\bf R}^d: 1/2 \leq |\xi| \leq 2\}} such that {\sum_N \psi(\frac{\xi}{N}) = 1} for all non-zero {\xi}. Actually, for the purposes of proving (7), it is more convenient to use the decomposition \displaystyle f = P_{\leq 1} f + \sum_{N>1} P_N f

where {P_{\leq 1} = \sum_{N \leq 1} P_N} is the projection to frequencies {|\xi| \lesssim 1}. By the triangle inequality, it then suffices to show the bounds \displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_{\leq 1} f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (8)

and \displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_N f(x)| \|_{L^p({\bf R}^d)} \lesssim N^{-\epsilon} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (9)

for all {N \geq 1} and some {\epsilon>0} depending only on {p,d}.

To prove the low-frequency bound (8), observe that {P_{\leq 1}} is a convolution operator with a bump function, and from this and the radius restriction {1 \leq r \leq 2} we see that {A_r P_{\leq 1}} is a convolution operator with a function of uniformly bounded size and support. From this we obtain the pointwise bound \displaystyle A_r P_{\leq 1} f(x) \lesssim Mf(x) \ \ \ \ \ (10)

and the claim (8) follows from (2).

Now we turn to the more interesting high-frequency bound (9). Here, {P_N} is a convolution operator with an approximation to the identity at scale {\sim 1/N}, and so {A_r P_N} is a convolution operator with a function of magnitude {O(N)} concentrated on an annulus of thickness {O(1/N)} around the sphere of radius {R}. This can be used to give the pointwise bound \displaystyle A_r P_N f(x) \lesssim N Mf(x), \ \ \ \ \ (11)

which by (2) gives the bound \displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_N f(x)| \|_{L^q({\bf R}^d)} \lesssim_q N \|f\|_{L^q({\bf R}^d)} \ \ \ \ \ (12)

for any {q > 1}. This is not directly strong enough to prove (9), due to the “loss of one derivative” as manifested by the factor {N}. On the other hand, this bound (12) holds for all {q>1}, and not just in the range {p > \frac{d}{d-1}}.

To counterbalance this loss of one derivative, we turn to {L^2} estimates. A standard stationary phase computation (or Bessel function computation) shows that {A_r} is a Fourier multiplier whose symbol decays like {|\xi|^{-(d-1)/2}}. As such, Plancherel’s theorem yields the {L^2} bound

\displaystyle \| A_r P_N f \|_{L^2({\bf R}^d)} \lesssim N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

uniformly in {1 \leq r \leq 2}. But we still have to take the supremum over {r}. This is an uncountable supremum, so one cannot just apply a union bound argument. However, from the uncertainty principle, we expect {P_N f} to be “blurred out” at spatial scale {1/N}, which suggests that the averages {A_r P_N f} do not vary much when {r} is restricted to an interval of size {1/N}. Heuristically, this then suggests that \displaystyle \sup_{1 \leq r \leq 2} |A_r P_N f| \sim \sup_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f|.

Estimating the discrete supremum on the right-hand side somewhat crudely by the square-function, \displaystyle \sup_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f| \leq (\sum_{1 \leq r \leq 2: r \in \frac{1}{N} {\bf Z}} |A_r P_N f|^2)^{1/2},

and taking {L^2} norms, one is then led to the heuristic prediction that \displaystyle \| \sup_{1 \leq r \leq 2} |A_r P_N f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}. \ \ \ \ \ (13)

One can make this heuristic precise using the one-dimensional Sobolev embedding inequality adapted to scale {1/N}, namely that \displaystyle \sup_{1 \leq r \leq 2} |g(r)| \lesssim N^{1/2} (\int_1^2 |g(r)|^2\ dr)^{1/2} + N^{-1/2} (\int_1^2 |g'(r)|^2\ dr)^{1/2}.

A routine computation shows that \displaystyle \| \frac{d}{dr} A_r P_N f \|_{L^2({\bf R}^d)} \lesssim N \times N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

(which formalises the heuristic that {A_r P_N f} is roughly constant at {r}-scales {1/N}), and this soon leads to a rigorous proof of (13).

An interpolation between (12) and (13) (for {q} sufficiently close to {1}) then gives (9) for some {\epsilon > 0} (here we crucially use that {p > \frac{d}{d-1}} and . It suffices to show that

\displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)},

where {R} ranges over dyadic numbers.

For any fixed {R}, the natural spatial scale is {R}, and the natural frequency scale is thus {1/R}. We therefore split

\displaystyle f = P_{\leq 1/R} f + \sum_{N > 1} P_{N/R} f,

and aim to establish the bounds \displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{\leq 1/R} f(x)| \|_{L^p({\bf R}^d)} \lesssim \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (14)

and \displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f(x)| \|_{L^p({\bf R}^d)} \lesssim N^{-\epsilon} \|f\|_{L^p({\bf R}^d)} \ \ \ \ \ (15)

for each {N > 1} and some {\epsilon>0} depending only on {d} and {p}, similarly to before.

A rescaled version of the derivation of (10) gives

\displaystyle A_r P_{\leq 1/R} f(x) \lesssim Mf(x)

for all {R \leq r \leq 2R}, which already lets us deduce (14). As for (15), a rescaling of (11) gives \displaystyle A_r P_{N/R} f(x) \lesssim N Mf(x),

for all {R \leq r \leq 2R}, and thus \displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f(x)| \|_{L^q({\bf R}^d)} \lesssim N \|f\|_{L^q({\bf R}^d)} \ \ \ \ \ (16)

for all {q>1}. Meanwhile, at the {L^2} level, we have \displaystyle \| A_r P_{N/R} f \|_{L^2({\bf R}^d)} \lesssim N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

and \displaystyle \| \frac{d}{dr} A_r P_{N/R} f \|_{L^2({\bf R}^d)} \lesssim \frac{N}{R} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

and so \displaystyle \| (\frac{1}{R} \int_R^{2R} |A_r P_{N/R} f|^2\ dr)^{1/2} + (\frac{R}{N^2} \int_R^{2R} |\frac{d}{dr} A_r P_{N/R} f|^2\ dr)^{1/2} \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}

which implies by rescaled Sobolev embedding that \displaystyle \| \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}.

In fact, by writing {P_{N/R} f = P_{N/R} \tilde P_{N/R} f}, where {\tilde P_{N/R}} is a slight widening of {P_{N/R}}, we have \displaystyle \| \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|\tilde P_{N/R} f\|_{L^2({\bf R}^d)};

square summing this (and bounding a supremum by a square function) and using Plancherel we obtain \displaystyle \| \sup_R \sup_{R \leq r \leq 2R} |A_r P_{N/R} f| \|_{L^2({\bf R}^d)} \lesssim N^{1/2} N^{-(d-1)/2} \|f\|_{L^2({\bf R}^d)}.

Interpolating this against (16) as before we obtain (15) as required.

No comments:

Post a Comment