A Cluttered Mind

Quick Proof of Cauchy Integral Formula

2024-10-11T00:00:00+00:00

$\newcommand\C{\mathbb{C}}$

I came up with the proof below while preparing a lecture for my complex analysis course. I learned from Sinan Gunturk that Peter Lax showed him the same proof back in 2007.

Let $O\subset\C$ be open and convex. Let $z_0 \in O$ and $c: [0,1] \rightarrow O$ be a closed piecewise $C^1$ curve.

Recall that the winding number of $c$ around $z_0$ is defined to be \[ W(c,z_0) = \frac{1}{2\pi i}\int_c \frac{dz}{z-z_0}. \]

We want to prove the following: Given a holomorphic $ f: O \rightarrow \C $, \[ \int_c \frac{f(z)}{z-z_0}\,dz = 2\pi i W(c,z_0)f(z_0). \]

The following easily proved facts will be used: Given any closed piecewise $C^1$ curve $c: [0,1] \rightarrow O$ and holomorphic $f: O \rightarrow \C$, \begin{align*} \frac{d}{dt}(f(c(t))) &= f'(c(t))c'(t)\\ \int_c f'(z)\,dz &= 0 \end{align*}

Proof. It suffices to prove this for $z_0 = 0$. Since $O$ is convex, for each $0 \le s \le 1$, the rescaled curve $sc$ is also in $O$. Let \begin{align*} I(s) &= \int_{sc}\frac{f(z)}{z}\,dz = \int_{t=0}^{t=1}\frac{f(s c(t))}{s c(t)}s c'(t)\,dt = \int_{t=0}^{t=1}f(s c(t))\frac{c'(t)}{c(t)}\,dt. \end{align*} Since \begin{align*} I(1) &= \int_{c}\frac{f(z)}{z}\,dz\\ I(0) &= \int_{t=a}^{t=b}f(0)\frac{c'(t)}{c(t)},dt = f(0)\int_c\frac{dz}{z} = f(0)2\pi i W(c,0), \end{align*} it suffices to prove that $I$ is a constant function.

By elementary analysis, $I(s)$ is differentiable. If $s \in (0,1]$, \begin{align*} I'(s) &= \int_{t=0}^{t=1} f'(sc(t))c(t)\frac{c'(t)}{c(t)}\,dt = \frac{1}{s}\int_{t=0}^{t=1} f'(s c(t))(s c)'(t)\,dt = \frac{1}{s}\int_{s c} f'(z)\,dz = 0 \end{align*}

Q.E.D

Homotopy invariance of Contour Integral

2024-10-11T00:00:00+00:00

$\newcommand\C{\mathbb{C}}$

This is a follow-up to Quick Proof of Cauchy Integral Formula. That proof is based on the standard proof, given below, of the homotopy invariance of the contour integral of a holomorphic function.

Let $O\subset\C$ be open.

Two continuous curves $c_0: [0,1]\rightarrow O$ and $c_1: [0,1] \rightarrow O$ are homotopic if there exists a continuous map $C: [0,1]\times[0,1] \rightarrow O$ such that \begin{align*} C(0,\cdot) &= c_0\\ C(1,\cdot) &= c_1. \end{align*} If the curves $c_0$, $c_1$ and the map $C$ are all $C^2$, we say that $c_0$ and $c_1$ are smoothly homotopic

Suppose the two curves have the same endpoints, i.e., $$ c_0(0) = c_1(0)\text{ and }c_0(1)=c_1(1). $$ They are smoothly homotopic with fixed endpoints if there exists a homotopy $C: [0,1]\times[0,1]\rightarrow O$ such that for each $s \in [0,1]$, $$ C(s,0) = c_0(0)\text{ and }C(s,1) = c_0(1). $$

Recall that the winding number of a closed curve $c: [0,1] \rightarrow \C$ around $z_0$ is defined to be \[ W(c,z_0) = \frac{1}{2\pi i}\int_c \frac{dz}{z-z_0}. \]

Let $f: O \rightarrow \C$ be holomorphic. We want to prove the following the following two theorems:

Theorem 1. If $c_0$ and $c_1$ are smoothly homotopic closed $C^2$ curves, then $$ \int_{c_0}f(z)\,dz = \int_{c_1}f(z)\,dz. $$

Theorem 2. If $c_0$ and $c_1$ are $C^2$ curves with the same endpoints and are smoothly homotopic with fixed endpoints, then $$ \int_{c_0}f(z)\,dz = \int_{c_1}f(z)\,dz. $$

Proof of theorems. Let $c_0: [0,1] \rightarrow O$ and $c_1: [0,1] \rightarrow O$ be $C^2$ curves, and let $C: [0,1]\times [0,1] \rightarrow O$ be a smooth homotopy. For each $s \in [0,1]$, let \begin{align*} I(s) &= \int_{C(s,\cdot)} f(z)\,dz\\ &= \int_{t=0}^{t=1} f(C(s,t))\partial_tC(s,t)\,dt \end{align*} $I$ is a differentiable function, and \begin{align*} I'(s) &= \int_{t=0}^{t=1} \partial_s(f(C(s,t))\partial_tC(s,t))\,dt. \end{align*} The crucial calculation is the following: \begin{align*} \partial_s(f(C(s,t))\partial_tC(s,t)) &= f'(C(s,t))\partial_sC(s,t)\partial_tC(s,t) + f(C(s,t))\partial^2_{st}C(s,t)\\ &= \partial_t(f(C(s,t))\partial_sC(s,t)). \end{align*} Therefore, by the Fundamental Theorem of Calculus, \begin{align*} I'(s) &= \int_{t=0}^{t=1} \partial_t(f(C(s,t))\partial_sC(s,t))\,dt\\ &= f(C(s,1))\partial_sC(s,1) - f(C(s,0))\partial_sC(s,0) \end{align*} If $c_0$ and $c_1$ are closed curves, then each $C(s,\cdot)$ is a closed curve, which implies that $$ C(s,0) = C(s,1)\text{ and }\partial_sC(s,0) = \partial_sC(s,1). $$ If $c_0$ and $c_1$ are homotopic with fixed endpoints, then $$ \partial_sC(s,0) = \partial_sC(s,1) = 0. $$ In both cases, it follows that $I'(s) = 0$ and therefore $$ \int_{c_0}f(z)\,dz = \int_{c_1}f(z)\,dz. $$

Q.E.D

Corollary 1. If $O$ is simply connected, then for any holomorphic $f: O \rightarrow \C$ and closed curve $c$, $$ \int_c f(z)\,dz = 0. $$

Corollary 2. If $O$ is simply connected, then for any holomorphic $f: O \rightarrow \C$, there exists a holomorphic $F: O \rightarrow \C$ such that $$ F' = f. $$

Remark. The crucial calculation works because $f$ has a local antiderivative $F$ and \begin{align*} f(C(s,t))\partial_sC(s,t) &= \partial_sF(C(s,t))\\ f(C(s,t))\partial_tC(s,t) &= \partial_tF(C(s,t). \end{align*} Therefore, since partials commute, \begin{align*} \partial_s(f(C(s,t))\partial_tC(s,t)) &= \partial_s(\partial_tF(C(s,t))\\ &= \partial_t(\partial_sF(C(s,t)))\\ &= \partial_t(f(C(s,t))\partial_sC(s,t)) \end{align*}

Chain Rule for Maps

2024-03-09T00:00:00+00:00

$\newcommand\R{\mathbb{R}}\newcommand\C{\mathbb{C}}\newcommand\Z{\mathbb{Z}}$

Chain Rule for Map between Open Subsets of Euclidean Space

Let $M$ be an open subset of $\R^m$, $N$ be an open subset of $\R^n$, and $O$ be an open subset of $\R^p$. Let $$F: M \rightarrow N\text{ and }G: N \rightarrow O$$ be $C^1$ maps. We want to prove the chain rule for the composition:

Differential of a Map

Recall that the directional derivative of $F$ at $x \in M$ in the direction $v \in \R^m$ is defined to be $$ D_vF(x) = (F\circ c)'(0), $$ where $c: I \rightarrow M$ is a $C^1$ curve such that $$ c(0) = x\text{ and }c'(0) = v. $$ In other words, if $v \in \R^n$ is the velocity vector of a curve $c$ passing through $x \in M$, then $D_vF(x)$ is the velocity vector of the curve $F\circ c$ at $F(x)$.

It is easy to check that the map $$ v \mapsto D_vF(x) $$ is independent of the curve $c$ (as long as $c$(0)=x$ and $c'(0)=v$) and is linear. The differential of $F$ at $x$ is defined to be this linear map and is denoted $$ \partial F(x): \R^m \rightarrow \R^n. $$ This is also called the Jacobian and often written as a matrix of partial derivatives. We avoid doing that here.

Chain Rule

Given $x \in M$, we want to find the formula for the differential of $$ G\circ F: M \rightarrow O. $$

Given $x \in M$ and $v \in \R^m$, let $c: I \rightarrow M$ be a curve such that $c(0) = x$ and $c'(0) = v$. This in turn defines a curve $$ F\circ c: I \rightarrow N, $$ which satisfies $(F\circ c)(0) = F(x)$ and, repeating what we said above, \begin{align*} (F\circ c)'(0) &= \partial F(x)v. \end{align*} Therefore, \begin{align*} \partial(G\circ F)(x)v &= \left.\frac{d}{dt}\right|_{t=0}(G\circ F)(c(t))\\ &= \left.\frac{d}{dt}\right|_{t=0}G((F\circ c)(t))\\ &= \partial G(F(x))(F\circ c)'(0)\\ &= \partial G(F(x))\partial F(x)v\\ &= (\partial G(F(x))\circ\partial F(x))v \end{align*} This proves the chain rule $$ \partial(G\circ F)(x) = \partial G(F(x))\circ \partial F(x), $$

as depicted here:

Chain Rule for Maps Between Manifolds

Since everything above is done without using coordinates, the same proof works for manifolds.

Let $M$ be an $m$-manifold, $N$ be an $n$-manifold, and $O$ be a $p$-manifold. Let $$F: M \rightarrow N\text{ and }G: N \rightarrow O $$ be $C^1$ maps. We want to prove the chain rule for the composition:

Pushforward Map

If $M$ and $N$ are manifolds, then the differential of $F$ is also called the pushforward map, $$ F_*: T_xM \rightarrow T_{F(x)}N, $$ which defined as follows: For any $v \in T_xM$, $$ F_*v = (F\circ c)'(0), $$ where $c: I \rightarrow M$ is a $C^1$ curve such that $c(0)=x$ and $c'(0)=v$. In other words, if $v \in T_xM$ is the velocity vector of a parameterized curve $c$ passing through $x \in M$, then $F_*v \in T_{F(x)}N$ is the velocity vector of the curve $F\circ c: I \rightarrow N$ at $F(x)$.

Chain Rule

Given $x \in M$, we want to find the formula for the pushforward map $$ (G\circ F)_*: T_xM \rightarrow T_{(G\circ F)(x)}O. $$

Given $x \in M$ and $v \in T_xM$, let $c: I \rightarrow M$ be a curve such that $c(0) = x$ and $c'(0) = v$. This in turn defines a curve $$ F\circ c: I \rightarrow N, $$ which satisfies $(F\circ c)(0) = F(x)$ and, by the definition of the pushforward map \begin{align*} (F\circ c)'(0) &= F_*v \end{align*} Therefore, \begin{align*} (G\circ F)_*v &= \left.\frac{d}{dt}\right|_{t=0}(G\circ F)(c(t))\\ &= \left.\frac{d}{dt}\right|_{t=0}G((F\circ c)(t))\\ &= G_*(F\circ c)'(0)\\ &=G_*F_*v\\ &= (G_*\circ F_*)v \end{align*} This proves the chain rule $$ (G\circ F)_* = G_*\circ F_*, $$ as depicted here:

Harvard Way to Define Trace

2024-01-11T00:00:00+00:00

$\newcommand\F{\mathbb{F}}\newcommand\R{\mathbb{R}}\newcommand\C{\mathbb{C}}\newcommand\Z{\mathbb{Z}}\newcommand\tr{\operatorname{trace}}\newcommand\End{\operatorname{End}}$

The trace of a sqaure matrix $A$ is defined to be the sum of the elements along the diagonal. A basic fact is that for any invertible matrix $M$, \begin{equation}\tag{*} \tr(M^{-1}AM) = \tr(A). \end{equation}

Let $V$ be an $n$-dimensional vector space over a field $\F$. The trace of a linear transformation $L: V \rightarrow V$ is usually defined as follows: Let $(v_1, \dots, v_n)$ be a basis of $V$. There exists a matrix $A$ such that $$ L(e_i) = A_i^je_j. $$ Then the trace of $L$ is defined to be $$ \tr(L) = \tr(A). $$ Fact (*) implies that this the right side remains the same, no matter which basis of $V$ is used.

A natural question is whether there is a way to define the trace of a linear transformation directly without using a basis or matrix. This would prove (*). One way is to use the universal property of the tensor product of two vector spaces.

Universal property of tensor product: Let $V$ and $W$ be vector spaces over a field $\F$. The tensor product $V\otimes W$ is a vector space with a bilinear map \begin{align*} B: V\times W &\rightarrow V\otimes W\\ (v,w) &\mapsto v\otimes w \end{align*} such that the following universal property holds: For any vector space $Z$ and bilinear map $$ b: V\times W \rightarrow Z, $$ there exists a unique linear map $$ \bar{b}: V\otimes W \rightarrow Z $$ such that $$ b = \bar{b}\circ B. $$

Space of linear transformations is tensor product: Let $\End(V)$ denote the space of linear transformations from $V$ to itself. The set of all rank $1$ linear transformations is the image of the following bilinear map \begin{align*} \phi: V \times V^* &\rightarrow \End(V), \end{align*} where for any $(v,\ell) \in V\times V^*$, the map $\phi(v,\ell): V \rightarrow V$ is the rank $1$ map such that for aany $w \in V$ to be $$ \phi(v, \ell)(w) = (\ell(w))v. $$ By the universal property above, this extends uniquely to a linear map $$ \bar\phi: V\otimes V^* \rightarrow \End(V). $$ It is straightforward to verify that $\bar\phi$ is an isomorphism.

Trace of linear transformation: The trace of any rank 1 linear transformation is given by the following natural bilinear function \begin{align*} e: V \times V^* &\rightarrow \F\\ (v,\ell) &\mapsto \ell(v) \end{align*} By the universal property above, this extends uniquely to a linear map $$ \bar{e}: V\otimes V^* \rightarrow \F. $$ The trace of a linear transformation $L \in \End(V)$ can now be defined to be $$ \tr(L) = \bar{e}\circ\bar{\phi}^{-1}(L). $$

Acknowledgement: This post was inspired by Levent Alpoge's Princeton PhD generals exam. Just look for "Harvard way".

Singular Value Decomposition

2023-01-05T00:00:00+00:00

$\newcommand\R{\mathbb{R}}\newcommand\C{\mathbb{C}}\newcommand\Z{\mathbb{Z}}$

The singular value decomposition is usually defined for a matrix. Here, we will show directly that any linear map between inner product spaces has a singular value decomposition. The singular value decomposition of a matrix follows by letting the inner product spaces be $\R^n$ or $\C^n$ with the standard inner product. Only real inner product spaces will be considered here, but the complex case is almost exactly the same.

Let $M$ be an $m$-dimensional real inner product space, $N$ be an $n$-dimensional real inner product space, and $L: M \rightarrow N$ be a linear map.

We start with a geometric description of a singular value decomposition of $L$. A singular value decomposition of $L$ consists of an orthonormal basis $(v_1, \dots, v_m)$ of $M$ and an orthonormal basis $(u_1, \dots, u_n)$ of $N$, andpositive real scalars $\lambda_1, \dots, \lambda_r$, where $r$ is the rank of $L$, such that the following hold:

$(v_{r+1}, \dots, r_m)$ is a basis of $\ker(L)$
$(u_{r+1}, \dots, u_n)$ is a basis of $(\operatorname{image}(L))^\perp$
$L(v_j) = \lambda_ju_j$ for each $1 \le j \le r$

The distinct values of $\lambda_1, \dots, \lambda_r$ are called the **singular values** of the map $L$.

Equivalently, a singular value decomposition of $L$ is $$ L = U\Sigma V^*, $$ where $V; \R^m \rightarrow M$ and $U: \R^n\rightarrow N$ are isometries and $\Sigma$ is a diagonal $n$-by-$m$ matrix such that \begin{align*} \Sigma_j^j &= \lambda_j \text{ if }1 \le j \le r\\ \Sigma_j^a &= 0\text{ otherwise}. \end{align*}

The equivalence follows by observing that if $1 \le j \le r$, then \begin{align*} U\Sigma V^*(v_j) &= U\Sigma e_j\\ &= \lambda_j Ue_j\\ &= \lambda_j u_j \end{align*} and if $r+1 \le j \le m$, then \begin{align*} U\Sigma V^*(v_j) &= U\Sigma e_j\\ &= 0. \end{align*}

A singular value decomposition of any linear map $L: M \rightarrow N$ can be constructed as follows:

The linear map $L^*L$ is nonnegative-definite and self-adjoint and therefore has nonnegative real eigenvalues $\lambda_1^2, \dots, \lambda_m^2$. We can assume that $\lambda_1, \dots, \lambda_r > 0$ and $\lambda_{r+1}, \dots, \lambda_m = 0$. Let $s_1, \dots, s_k$ be the distinct values of $\lambda_1, \dots, \lambda_r$.

The eigenspaces of $L^*L$ are mutually orthogonal. Denote the dimensions of the eigenspaces for $s_1^2, \dots, s_k^2$ by $d_1, \dots, d_k$, respectively. Observe that $$ r = d_1 + \cdots + d_k $$ is the rank of $L$. Given an ordering of the singular values, we can assume that the ordering of $\lambda_1, \dots, \lambda_m$ is given by \begin{align*} \lambda_{d_1+\cdots+d_{j-1}+1}=\cdots=\lambda_{d_1+\cdots+d_{j-1}+d_j}&= s_j\text{ for each }1 \le j \le r\\ \lambda_{r+1} = \cdots \lambda_m &= 0. \end{align*}

Let $(v_1, \dots, v_m)$ be an orthonormal basis of $M$ of eigenvectors of $L^*L$, where each $v_j$ is an eigenvector for the eigenvalue $\lambda_j$. Let $V: \R^m \rightarrow M$ be the linear map such that $$ V(e_j) = v_j,\ 1 \le j \le m, $$ where $(e_1, \dots, e_m)$ is the standard basis of $\R^m$.

For each $1 \le j \le r$, let \[ \bar{u}_j = L(v_j) \in N. \] For each $1 \le i, j \le r$, \begin{align*} \langle \bar{u}_i,\bar{u}_j\rangle &= \langle L(v_i), L(v_j)\rangle\\ &= \langle v_i, L^*L(v_j)\rangle\\ &= \lambda_j^2 \langle v_i,v_j\rangle\\ &= \lambda_j^2\delta_{ij}. \end{align*} Since $\lambda_1, \dots, \lambda_r > 0$, it follows that $$ u_1 = \frac{\bar{u}_1}{\lambda_1}, \dots, u_r = \frac{\bar{u}}{\lambda_r} $$ is an orthonormal basis of $\operatorname{image}(L) \subset N$. This can be extended to an orthonormal basis $(u_1, \dots, u_n)$ of $N$. By their construction, the bases $(v_1, \dots, v_m)$ and $(u_1, \dots, u_n)$ satisfy the geometric definition of a singular value decomposition of $L$.

If we now define $V: \R^m \rightarrow M$ and $U: \R^n \rightarrow N$ such that \begin{align*} V(e_j) &= v_j,\ 1 \le j \le m\\ U(e_a) &= u_a,\ 1 \le a \le n, \end{align*} then they satisfy the second definition of a singular value decomposition.

A natural question is to what extent are $U$ and $V^*$ unique. It is clear from the above construction that if we fix an ordering of the singular values $s_1, \dots, s_k$ and the corresponding ordering of $\lambda_1, \dots, \lambda_r$, as specified above, then $V$ is unique up to rotations in each eigenspace of $L^tL$ and, given $V$, $U$ is unique up to rotations of $(\operatorname{image}(L))^\perp$.

Visualization of Moore-Penrose Pseudoinverse

2022-12-16T00:00:00+00:00

$\newcommand\R{\mathbb{R}}\newcommand\C{\mathbb{C}}\newcommand\Z{\mathbb{Z}}$

Exponential functions and Euler’s formula

2022-06-05T00:00:00+00:00

$\newcommand\R{\mathbb{R}}\newcommand\C{\mathbb{C}}\newcommand\Z{\mathbb{Z}}\newcommand\Q{\mathbb{Q}}$

This post was driven by a desire to explain and prove Eulder's formula, \begin{equation}\label{euler} e^{i\theta} = \cos\theta + i\sin \theta, \end{equation} in a more conceptual way than the standard explanation using power series. Power series are still used as the key technical tool for proving the existence and uniqueness of exponential functions.

Acknowledgements. This was provoked by a Quanta Magazine article by Steven Strogatz on how to prove Euler's formula using power series. The discussion below was inspired by Michael Hutchings' observation that Euler's formula can be proved by solving an ordinary differential equation. I'd also like to thank Dan Lee, Keith Conrad, and my other Facebook friends for their comments and corrections.

Introduction

At first, even the meaning of the left side of Euler's formula \eqref{euler} is unclear. The standard approach is to use power series to define $e^{i\theta}$ and prove Euler's formula. The argument is outlined below. It is simple and elegant. I have, however, always found it unsatisfying, since the power series alone provides little intuition into what's going on.

Here, I provide an alternative approach that I find that is easier to understand intuitively. Moreover, it accomplishes much more. Using only calculus, it provides definitions of the constants $\pi$ and $e$, as well as definitions and basic properties of the exponential, sine, and cosine functions, including Euler's formula.

The exposition here does not require the use of power series. Power series are needed only for the rigorous proof of the existence and uniqueness theorem of exponential functions

Power series proof of Euler’s formula

Here is a brief outline of how to prove Euler's formula using power series. The starting point are the power series \begin{align} e^{x} &= \sum_{k=0}^\infty \frac{x^k}{k!} = 1 + x + \frac{x^2}{2!} +\cdots \label{taylor}\\ \sin x &= \sum_{k=0}^\infty (-1)^k\frac{x^{2k+1}}{(2k+1)!} = x - \frac{x^3}{3!} + \frac{x^5}{5!} + \cdots \notag\\ \cos x &= \sum_{k=0}^\infty (-1)^k\frac{x^{2k}}{(2k)!} = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} + \cdots. \notag \end{align} It follows by the ratio test that these series converge for every $x \in \R$. The key observation is that these power series also converge if $x$ is complex, extending the domains of these functions from $\R$ to $\C$. Euler's formula now follows by setting $x = i\theta$ in \eqref{taylor} and splitting the series into its real and imaginary parts, \begin{align*} e^{i\theta} &= \sum_{k=0} \frac{(i\theta)^k}{k!}\\ &= \sum_{k=0} (-1)^k \frac{\theta^{2k}}{(2k)!} + i\sum_{k=0} (-1)^k \frac{\theta^{2k+1}}{(2k+1)!}\\ &= \cos\theta + i\sin\theta. \end{align*} This is simple and elegant, but the proof provides little intuition to what is going on. Euler's formula appears magically.

Below is an alternative approach that I think elucidates how the exponential function of both real and imaginary numbers arise and why exponential of an imaginary number has a geometric interpretation. The explanation below requires only basic calculus. No knowledge of differential equations is needed. The appendix provides a rigorous discussion showing how to use power series to solve a simple ordinary differential equation. The uniqueness of the solution follows from basic properties of the integral.

Exponential function of a real number

We start by postulating what we mean by an exponential function.

Linear function

We start by with the definition of a linear function. A function $f: \R \rightarrow \R$ is linear if the change in output depends only on the change in input and not on the input itself. More precisely, for any change in input, $\Delta \ne 0$, there exists $c(\Delta) \in \R$ such that $$ f(x+\Delta) - f(x) = c(\Delta),\ \forall x \in \R $$ Observe that $c(0) = 0$. Therefore, if $f$ is assumed to be differentiable, then \begin{align*} f'(x) &= \lim_{\Delta\rightarrow 0} \frac{f(x+\Delta)-f(x)}{\Delta}\\ &= \lim_{\Delta\rightarrow 0} \frac{c(\Delta)-c(0)}{\Delta}\\ &= c'(0), \end{align*} which is a constant. If we set $m = c'(0)$ and $b = f(0)$, this implies that if $f$ is linear, then $$ f(x) = mx + b. $$ The converse also holds.

Exponential functions

An exponential function has a similar definition except that change in output is replaced by relative change in input. In other words, a function $E: \R \rightarrow \R$ is exponential if the percentage or relative change of $E$ depends only on the change in input and not on the input itself. More precisely, for any $\Delta \in \R$, there exists $C(\Delta)$ such that \begin{equation}\label{relative-change} \frac{E(x+\Delta)-E(x)}{E(x)} = C(\Delta). \end{equation} Observe that $c(0) = 0$. From this, it follows that, if $E$ is differentiable, then \begin{align*} \frac{E'(x)}{E(x)} &= \lim_{\Delta\rightarrow 0}\frac{E(x+\Delta)-E(x)}{E(x)\Delta}\\ &= \lim_{\Delta\rightarrow 0}\frac{C(\Delta)-C(0)}{\Delta}\\ &= C'(0) \end{align*} This implies that $C(\Delta)$, as a function of $\Delta$, is differentiable at $\Delta = 0$, and $$ E'(x) = \kappa E(x), $$ where $\kappa = c'(0)$ is a constant.

The following holds:

Given any $\kappa, e_0 \in \R$, there exists a unique differentiable function $E: \R \rightarrow \R$ satisfying \begin{equation}\label{ode} E' = \kappa E\text{ and }E(0) = e_0. \end{equation}

The proof of existence is given in the Appendix. The proof of uniqueness is below.

Given $\kappa \in \R$, let $e_\kappa$ denote the unique function such that \begin{equation} e_\kappa’ = \kappa e_\kappa\text{ and }e_\kappa(0) = 1 \end{equation} The function $e_1$ will be called the standard exponential function.

Observe that if $E$ satisfies \eqref{ode}, then by the uniqueness statement of the theorem and the chain rule, $$ E(x) = e_0e_1(\kappa x),\ \forall x \in \R. $$ In particular, $$ e_\kappa(x) = e_1(\kappa x),\ \forall x \in \R. $$

Uniqueness of exponential functions

Also proved in the Appendix is that if $e_0$ and $\kappa$ are positive, then the function $E$ given by \eqref{ode} is positive and strictly increasing. This proves uniqueness of $E$ by the following argument.

Let $E_1$ and $E_2$ both satisfy \eqref{ode}. It follows by the quotient rule for differentiation that $$ \left(\frac{E_1}{E_2}\right)' = \frac{E_2E_1'-E_1E_2'}{E_2^2} = \frac{E_1}{E_2}\left(\frac{E_1'}{E_1}-\frac{E_2'}{E_2}\right) = \kappa - \kappa = 0. $$ Since $E_1(0) = E_2(0)$, it follows that $E_1 = E_2$.

Translation invariance

The most important property of the function is called translation invariance, which says that for any $\kappa \ne 0$, $$ e_\kappa(s+t) = e_\kappa(s)e_\kappa(t),\ \forall s, t \in \R. $$ This follows from the fact that if $s$ is held fixed, each side defines a function that satisfies $$ E_s'(t) = \kappa E_s(t)\text{ and }E_s(0) = e_\kappa(s). $$

Relative Change in Output of an Exponential Function

For any function $E$ satisfying \eqref{ode} and any $x, \Delta \in \R$, \begin{align*} E(x+\Delta)-E(x) &= e_0e_\kappa(x+\Delta) - e_0e_\kappa(x)\\ &= e_0(e_\kappa(x)e_\kappa(\Delta) - e_\kappa(x))\\ &= e_0e_\kappa(x)(e_\kappa(\Delta) - 1)\\ &= E(x)c(\Delta), \end{align*} where $$ c(\Delta) = e_\kappa(\Delta) - 1. $$ From this, it follows that a differentiable function $E$ satisfies \eqref{relative-change} if and only if it satisfies \eqref{ode}.

Euler’s constant $e$

We define Euler’s constant to be $e = e_1(1).$

By translation invariance, it follows that for any nonnegative $k \in \Z$, $$ e_1(k+1) = e_1(k). $$ By induction and that $e_1(0)=0$, it follows that $$ e_1(k) = e^k. $$ Also, since $$ 1 = e_1(0) = e_1(k+(-k)) = e_1(k)e_(-k), $$ it follows that for any positive integer $k$, $$ e_1(-k) = e^{-k}. $$ Therefore, for any $k \in \Z$, $$ e_1(k) = e^k. $$

For any rational number $\frac{n}{d}$, where $n \in \Z$ and $d \in \Z_+$, \begin{align*} e^n &= e_1(n)\\ &= e_1\left(\frac{n}{d}+\cdots+\frac{n}{d}\right)\\ &= \left(e_1\left(\frac{n}{d}\right)\right)^d \end{align*} It follows that \begin{align*} e_1\left(\frac{n}{d}\right) &= (e^n)^{\frac{1}{d}} = e^{\frac{n}{d}}. \end{align*} In short, for any $r \in \Q$, $$ e_1(r) = e^r. $$

These properties justify the following definition: For any $x \in \R$, we denote the standard exponential function by $$ e^x = e_1(x), $$ which satisfies \begin{align*} e^x &> 0,\ \forall x \in \R\\ e^0 &= 1\\ e^{x+y} &= e^xe^y,\ \forall x,y \in \R. \end{align*} Note that if $x$ is irrational, this is the definition of $e^x$.

Moreover, any exponential function $E: \R \rightarrow \R$ can be written as $$ E(t) = e_0e^{\kappa t}, $$ where $e_0$ and $\kappa = E'(0)/E(0)$ are constants.

Exponential of an imaginary number

Definition

Now suppose you want to extend the definition of $e^t$ to $t \in \C$. Let us focus first on the case $t = i$. One possible approach is defining this is setting $$ e^i = e_1(i). $$ But this is problematic, because it would mean trying to solve \eqref{ode} with $t$ being complex instead of real. A better approach is to use \eqref{ode} as a template and define $$ e^i = e_i(1), $$ where $e_i$ satisfies the equation \begin{equation} e_i' = ie_i\text{ and }e_i(0) = 1. \label{ode2} \end{equation} By Theorem A in the appendix that there is a unique solution $e_i: \R \rightarrow \C$ to \eqref{ode2}. Moreover, for any $t_1, t_2 \in \R$, $$ e_i(t_1 + t_2) = e_i(t_1)e_i(t_2), $$ which justifies writing, for any $t \in \R$, $$ e_i(t) = e^{it}. $$

Geometric properties of the exponential of an imaginary number

We now want to understand the function $e^{it}$ better. An unexpected twist is that geometry and trigonometry naturally appear in the description of this function.

If we write $e_i(t) = x(t) + iy(t)$, then \eqref{ode2} is equivalent to the system \begin{align}\label{ode3} (x',y') &= (-y,x)\text{ and }(x(0),y(0)) = (1,0). \end{align} A solution to this satisfies $$ (x',y')\cdot (x,y) = x'x + y'y = -yx + xy = 0. $$ Therefore, $(x(t),y(t))$, $t \in \R$, is a parameterized curve whose velocity vector $$ v = (x', y') $$ is always orthogonal to the position vector $(x,y)$. It follows easily from \eqref{ode3} that \begin{align*} x^2 + y^2 &= 1\\ (x')^2+(y')^2 &= 1. \end{align*} The first equation says the curve always lies on the unit circle centered at the origin. The second says that the speed, which is the norm of the velocity vector, is always equal to $1$. Putting this all together, we see that the solution is a unit speed parameterization of the circle.

Since the parameterization has unit speed, it is intuitively clear that, as $t$ increases, the solution $(x(t),y(t))$ to \eqref{ode3} goes around the entire circle at least one. In particular, there exists $T > 0$ such that $$ e_i(T) = e_i(0). $$ The translation invariance of \eqref{ode2} implies $$ e_i(t + T) = e_i(t). $$ A function with this property is called periodic.

Definition of $\pi$

If $T$ is the smallest positive constant such that $$ e^{iT} = 1, $$ the constant $\pi$ is defined to be $$ \pi = \frac{1}{T}. $$ Since, for each $t \in [0,2\pi]$, $$ e_i(t) = (x(t),y(t)) $$ is the point reached by traveling at unit speed along the circle staring from $(1,0)$, we can define the length of the arc from $(1,0)$ to $(x(t),y(t))$, where $0 \le t < 2\pi$, to be $t$. In particular, the circumference of the circle, which is defined to be the length of the full circle is $2\pi$.

The angle in radians from $(1,0)$ to the point $(x,y)$ on the unit circle is defined to be the unique $t \in [0,2\pi)$ such that $$ e^{it} = x + iy. $$

Definitions and properties of trig functions

We can now define the basic trig functions to be, for any $\theta \in \R$, \begin{align*} \cos\theta &= x(\theta)\\ \sin\theta &= y(\theta), \end{align*} where $$ e^{i\theta} = x(\theta) + iy(\theta). $$ In other words, $$ e^{i\theta} = \cos\theta + i\sin\theta. $$ Euler's formula is therefore the definition of the sine and cosine functions.

Straightforward consequences of everything above include $$ (\sin\theta)^2 + (\cos\theta)^2 = 1, $$ Euler's formula $$ e^{i\theta} = \cos\theta + i\sin\theta, $$ and the standard differentiation formulas, \begin{align*} \frac{d}{d\theta}(\sin\theta) &= \cos\theta\\ \frac{d}{d\theta}(\cos\theta) &= -\sin\theta. \end{align*} It is also straightforward to use the symmetries of \eqref{ode2} to derive all of the basic properties of the sine and cosine functions, such as: \begin{align*} \sin(n\pi) &= 0\\ \sin\left(\frac{\pi}{2}+2\pi n\right) &= 1\\ \sin\left(\frac{3\pi}{2}+2\pi n\right) &= -1\\ \cos(2n\pi) &= 1\\ \cos((2n+1)\pi) &= -1\\ \cos\left(\frac{\pi}{2}+n\pi\right) &= 0, \end{align*} where $n$ is any integer. Many trigonometric identities also follow easily by raising $e^{i\theta}$ to integer and rational powers.

Exponential of a complex number

Finally, it is now obvious how to define the exponential of a complex number $z$, namely $$ e^{z} = e_z(1), $$ where $e_z$ is the unique solution to the ODE \begin{equation}\label{ode4} e_z' = ze_z\text{ and }e_z(0) = 1. \end{equation} Again, it is straightforward to show that $$ e^{z_1+z_2} = e^{z_1}e^{z_2} $$ and, in particular, $$ e^{x+iy} = e^xe^{iy} = e^x(\cos y + i\sin y).. $$

Summary

We have succeeded in using only calculus to obtain the following in a natural and intuitive way:

Definitions of the constants $e$ and $\pi$
Definitions and fundamental properties of the exponential functions $e^z$ for any $z \in \C$
Definitions of the trigonometric functions $\sin\theta$ and $\cos\theta$
Euler's formula

Appendix

Existence and uniqueness of solutions to \eqref{ode}, \eqref{ode2}, \eqref{ode4}

Given $e_0, z \in \C$, there exists a unique differentiable function $E: \R \rightarrow \C$ satisfying \begin{equation}\label{ivp} E' = ze\text{ and }E(0) = e_0. \end{equation}

We use power series to prove that a solution to \eqref{ivp} exists. Uniqueness is proved above.

Consider the power series $$ E(t) = \sum_{k=0} c_kt^k. $$ If $E$ satisfies the equation \eqref{ivp}, then that $c_0 = e_0$ and $$ \sum_{k=0}^\infty (k+1)c_{k+1}t^k = \sum_{k=0}^\infty zc_kt^k, $$ which implies that for each $k \ge 0$, $$ c_{k+1} \frac{z}{k+1}c_k. $$ By induction, we see that $$ c_k = e_0\frac{z^k}{k!}, $$ Therefore, the power series for $E$ is $$ e_0\sum_{k=0}^\infty \frac{(zt)^k}{k!}. $$ It is easily checked that by the ratio test for series, the series converges absolutely for all $t \in \R$. We therefore define $$ E(t) = e_0\sum_{k=0}\frac{(zt)^k}{k!}. $$

It remains to show that $E$ really does satisfy the \eqref{ivp}. First, we derive the power series for $E'$ as follows: \begin{align*} E'(x) &= \lim_{y\rightarrow x} \frac{E(y)-E(x)}{y-x}\\ &= \lim_{y\rightarrow x} \frac{1}{y-x}\left(e_0\sum_{k=0}^\infty \frac{(\kappa y)^k}{k!} - e_0\sum_{k=0}^\infty \frac{(\kappa x)^k}{k!}\right)\\ &= e_0\lim_{y\rightarrow x} \sum_{k=0}^\infty \frac{1}{k!}\left(\frac{(\kappa y)^k-(\kappa x)^k}{y-x}\right)\\ &= e_0\lim_{y\rightarrow x} \sum_{k=1}^\infty \frac{\kappa^k}{k!}(y^{k-1}+y^{k-2}x + \cdots + yx^{k-2} + x^{k-1})\\ &= e_0\sum_{k=1}^\infty \frac{\kappa^k}{k!}kx^{k-1}\\ &= \kappa e_0\sum_{k=0}^\infty \frac{(\kappa x)^k}{k!}\\ &= \kappa E. \end{align*} Since, by the ratio test, all of the series in the calculation above converge absolutely, it is a valid calculation and therefore $$ E' = \kappa E. $$

A solution to \eqref{ode2} goes around the whole circle

Details coming.

Orientation of a manifold

2022-06-04T00:00:00+00:00

$\newcommand\R{\mathbb{R}}$ $\newcommand\extV{\Lambda^nV^*}$ $\newcommand\extVo{\Lambda^nV^*\backslash\{0\}}$ $\newcommand\extT{\Lambda^nT^*M}$

It always starts with linear algebra. I like to say that differential geometry is the study of parameterized families of vector spaces.

Orientation of a vector space

The first observation is that a $1$-dimensional vector space with the origin removed has two connected components. There is no natural way of labeling one as positive and the other as negative. The second observation is that if $V$ is an $n$-dimensional vector space, then the vector space $\extV$ of exterior $n$-tensors is $1$-dimensional.

An orientation on $V$ is defined by choosing one of the connected components of $\extVo$. Let’s call that component $\extV_+$. Given any nonzero $\Theta \in \extV$, we say that $\Theta$ has positive orientation if $\Theta \in \extV_+$ and negative orientation otherwise.

Conversely, any nonzero $\Theta \in \extV$ uniquely determines an orientation on $V$ by letting $\extV_+$ be the connected component of $\extVo$ that contains $\Theta$.

Also, note that $\Theta_1, \Theta_2 \in \extVo$ have the same orientation if and only if $\Theta_2 = c \Theta_1$ for some $c > 0$.

Orientation of a manifold

Now let $M$ be a smooth manifold. We say that $M$ is orientable if there exists a continuous nowhere zero exterior $n$-form $\Theta$ on $M$. If such an form exists, then it uniquely determines an orientation on each $T_pM$, $p \in M$.

Any nowhere zero $n$-form on a connected oriented manifold is either positively or negatively oriented. If the manifold is not connected, then the form has a sign on each connected component but the signs do not have to be the same.

Volume form

A volume form on $M$ is defined to be a continuous nowhere zero $n$-form. If $M$ is connected and oriented, then a volume form is either positively or negatively oriented.

Integral of a volume form

The definition of the integral of a volume form on an $n$-dimensional manifold has an ambiguity in its sign. If the manifold is oriented, then the ambiguity is resolved as follows:

If a volume form is positively oriented, then the sign of its integral is chosen to be positive. If it has negative orientation, then the sign is chosen to be negative.

More generally, if $\Theta$ is a volume form and $f$ is a continuous nonnegative function on $M$, then the sign of $\int_M f\Theta$ is chosen to be nonnegative.

Integral of an $n$-form

Let $\Theta$ be a positively oriented volume form on an oriented manifold $M$. If $\Omega$ is a continuous $n$-form, there is a continuous function $f$ on $M$ such that $\Omega = f\Theta$. If we let $f_+$ and $f_-$ denote the positive and negative parts of $f$, then we define the integral of $\Omega$ to be $$ \int_M \Omega = \int_M f_+\Theta - \int_M f_-\Theta. $$

It is now clear that if $M$ is oriented but not connected, then for any $c \in \R$ there exists a volume form $\Theta$ such that $\int_M \Theta = c.$

The definition of a manifold

2022-03-22T00:00:00+00:00

$\newcommand{\R}{\mathbb{R}}$ I’ve always disliked the standard ways to define a manifold in differential geometry. First, the definition always starts with a topological space $M$. I don’t understand why you need make this assumption. I prefer to show that the topology of $M$ is a natural consequence of the definition. Second, the definition always uses two technical terms, paracompact and Hausdorff. I prefer to describe the properties concretely in terms of coordinate maps.

Below is how I prefer to define a manifold. It turns out that Peter Olver, in his book Applications of Lie Groups to Differential Equations (published in 1986) defines a manifold in exactly the same way (see Definition 1.1).

Start with a set $M$, just a set. Define a coordinate map to be a bijection $\phi: O \rightarrow \mathbb{R}^n$, where $O$ is a subset of $M$ and $\phi(O) \subset \mathbb{R}^n$ is open. Define an atlas to be a countable collection of coordinate maps, where the domains of the maps cover $M$. No assumptions about topology or smoothness yet.

A topological atlas is one where for any two coordinate maps $\phi_1: O_1 \rightarrow \mathbb{R}^n$ and $\phi_2: O_2\rightarrow \mathbb{R}^n$ such that $O_1\cap O_2 \ne \emptyset$, the change of coordinate map

\[\phi_2\circ\phi_1^{-1}: \phi_1(O_1\cap O_2) \rightarrow \phi_2(O_1\cap O_2)\]

is a homeomorphism. A topological manifold is a set $M$ with a topological atlas.

Observe that this defines a topology on $M$, where any coordinate map $\phi: O \rightarrow \phi(O) \subset \R^n$ is a homeomorphism. The assumption on the change of coordinate maps is exactly what is needed for this definition to be logically consistent.

Usually (but not always), there is one more assumption made, which I like to state as follows:

You can separate points using coordinate charts.

More precisely, given two different points $p_1, p_2 \in M$, there exist coordinate maps $\phi_1: O_1\rightarrow \R^n$, $\phi_2: O_2\rightarrow \R^n$ and open subsets $U_1 \subset O_1$, $U_2\subset O_2$ such that

$p_1 \in U_1$ and $p_2 \in U_2$
$U_1\cap U_2 = \emptyset$

The fact that the atlas has countably many coordinate maps is equivalent to $M$ being paracompact. The fact that points can be separated by coordinate charts is equivalent to $M$ being Hausdorff.

The definition of a smooth manifold is exactly the same, except that the change of coordinate maps are also assumed to be smooth, i.e., diffeomorphisms.

What’s the intuition behind the coarea formula?

2021-07-09T00:00:00+00:00

$\newcommand{\R}{\mathbb{R}}$

(This is based on an answer to this question on math.stackexchange.com)

The coarea formula is a way to write an integral on a Riemannian manifold in terms of integrals over level sets of a function. It is widely used for proving functional inequalities, such as Sobolev inequalities, via symmetrization arguments.

For simplicity, we restrict to an integral over an open domain $O \subset\R^n$. The coarea formula states the following:

If $f: A \rightarrow \R^k$ is Lipschitz, where $k < n$, and $\phi: A \rightarrow \R$ is $L^1$, then \begin{align*} \int_A \phi(x)|\partial f(x)|\,dx &= \int_{\R^k} \left(\int_{f^{-1}(y)} \phi(x)\,dH_{n-k}(x)\right)\,dy, \end{align*} where $H_{n-k}$ is $(n-k)$-dimensional Hausdorff measure on each $f^{-1}(y)$, $\partial f$ is the Jacobian of $f$, and $$ |\partial f| = \sqrt{\det (\partial f \partial f^T)} $$

This is most easily understood when $n = 1$ and $f$ is a bounded smooth function whose gradient is everywhere nonzero.

The level sets of $f$ are non-intersecting hypersurfaces. Suppose first that you want to compute the volume of the $A$ in terms of the surface areas of the hypersurfaces. Let $a = \inf f$ and $b = \sup f$. You can divide the interval $[a,b]$ into equal sized subintervals $I_1 = [y_0,y_1], \dots, I_N = [y_{N-1},y_N]$ of size $\delta = (b-a)/N$. The volume of $A$ is the sum of the volumes of $f^{-1}(I_k)$. On the other hand, each $f^{-1}(I_k)$ is a shell with varying thickness. At each point, the thickness is roughly $\Delta t/|\nabla f|$. So the volume of the shell is roughly $$ V(f^{-1}(I_k)) \simeq \Delta t\int_{f^{-1}(y_k)}\frac{dH_{n-1}}{|\nabla f|}, $$ where $dH_{n-1}$ is the $(n-1)$-dimensional Hausdorff measure on $f^{-1}(y_k)$. Adding up the volumes of the shells and taking the limit $N \rightarrow \infty$, we see that the volume of $A$ is $$ V(A) = \int_{-\infty}^{\infty} \left(\int_{f^{-1}(y)} \frac{dH_{n-1}}{|\nabla f|}\right)\,dy, $$ where $dH_{n-1}$ is the $(n-1)$-dimensional Hausdorff measure of $f^{-1}(y)$.

If, on the other hand, you integrate $\phi|\nabla f|$ on $A$ using the same approach, you get $$ \int_{f^{-1}(I_k)} \phi|\nabla f|\,dx \simeq \Delta y\int_{f^{-1}(y_k)}\phi|\nabla f|\,\frac{dH_{n-1}}{|\nabla f|}, $$ and therefore in the limit $N \rightarrow \infty$, $$ \int_A \phi|\nabla f|\,dx = \int_{-\infty}^{\infty} \left(\int_{f^{-1}(y)} \phi\,dH_{n-1}\right)\,dy $$

The $k>1$ case is similar, except you chop $\mathbb{R}^k$ into small rectangular pieces $B_\alpha$ and use the fact that the cross section of the inverse image of each piece is roughly a parallelogram whose volume is $|\partial f|^{-1}$ times the volume of $B_\alpha$. The argument analogous to the one above yields $$ \int_A \phi|\partial f|\,dx = \int_{\mathbb{R}^k} \left(\int_{f^{-1}(y)}\phi\,dH_{n-k}\right)\,dy. $$