Definition: Let $X$ be a vector space. An affine hyperplane $H$ is a level set of some linear functional $\zeta \in X^*$, ie. $H= \{x \in X \mid \langle \zeta,x \rangle=c \}$ with $c \in \mathbb{R}$.

In particular, such an hyperplane disconects $X$ into two half-spaces $\{x \in X \mid \langle \zeta,x \rangle \leq c\}$ and $\{x \in X \mid \langle \zeta,x \rangle \geq c\}$, and we say that $H$ separates two subsets if they are included in two different half-spaces.

Theorem: (separation) Let $X$ be a vector space and $C_1,C_2 \subset X$ be two nonempty disjoint convex subsets. If $C_1$ is open then there exists an affine hyperplane separating $C_1$ and $C_2$.

Sketch of proof. Let $x_0 \in C_1$ and $y_0 \in C_2$. Define $z=x_0-y_0$ and $C=C_2-C_1+z$, and consider the following Minkowski functionnal:

$g : \left\{ \begin{array}{ccc} X & \to & [0,+ \infty) \\ x & \mapsto & \inf \{ \lambda >0 \mid x \in \lambda C \} \end{array} \right.$

Notice that $g$ is well-defined because $C$ is open. Then, define the linear functional:

$\xi : \left\{ \begin{array}{ccc} \mathbb{R}z & \to & \mathbb{R} \\ tz & \mapsto & t \end{array} \right.$

If $t \leq 0$, clearly $\langle \xi, tz \rangle =t\leq g(tz)$. If $t>0$, then $\langle \xi ,tz \rangle =t \leq tg(z)=g(tz)$ since $z \notin C$ implies $g(z) \geq 1$. Therefore $\xi \leq p$.

Now, using Hahn-Banach theorem, $\xi$ can be extended into a linear functional $\zeta : X \to \mathbb{R}$ satisfying $\zeta \leq p$, hence

$\displaystyle \langle \zeta,x \rangle- \langle \zeta,y \rangle + \underset{=1}{\underbrace{\langle \zeta,z \rangle}} \leq g( \underset{\in C}{\underbrace{x-y+z}} ) \leq 1, \ \forall x \in C_1, \ \forall y \in C_2.$

That is,

$\langle \zeta,x \rangle \leq \langle \zeta,y \rangle, \ \forall x \in C_1, \ \forall y \in C_2. \hspace{1cm} \square$

Definition: Let $X$ be a normed vector space, $C \subset X$ be a closed convex subset and $x_0 \in \partial C$. A supporting hyperplane at $x_0$ is an affine hyperplane separating $\{x_0\}$ and $C$.

Our main geometrical results are:

Theorem 1: Let $X$ be a normed space and $C \subset X$ be a closed subset whose interior is nonempty. Then $C$ is convex if, and only if, $C$ admits a supporting hyperplane at any $x \in \partial C$.

Proof. Suppose that $C$ is convex and let $x \in \partial C$. Applying separation theorem, there exists an affine hyperplane $H$ separating $x$ and $\mathrm{int}(C)$. Then $H$ is a supporting hyperplane at $x$.

Conversely, suppose that $C$ admits a supporting hyperplane at any point of $\partial C$ and suppose by contradiction that $C$ is not convex. Therefore, there exist $x,y \in \partial C$ and $\lambda \in [0,1]$ such that $t:=(1-\lambda)x+ \lambda y \notin C$. Let $z \in \mathrm{int}(C)$ and let $\Delta$ be the triangle whose vertices are $x$, $y$ and $z$.

If $H$ is a supporting hyperplane at $x$, then $\Delta \cap H$ is a straight line separating $t$ and $z$ and therefore $y$ and $z$. We deduce that $H$ itself separates $y$ and $z$ (two elements of $C$), a contradiction. $\square$

Theorem 2: Let $X$ be a normed vector space and $C \subset X$ be a closed convex subset whose interior is nonempty. Then the intersection of the half-spaces containing $C$ coincides with $C$.

Proof. Let $M$ be the given intersection. Clearly, $C \subset M$. Moreover, if $x \notin C$ then separation theorem can be used to separate $\{x\}$ and $\mathrm{int}(C)$ so that $x \notin M$. So $M \subset C$.

Notice that the converse is true: an intersection of closed convex sets is closed and convex. $\square$

The result allowing us to link convex sets and convex functions is the following (the (classic and easy) proof is left to the reader):

Property 1: Let $X$ be a vector space and $f : X \to \mathbb{R}$ be a function. Then $f$ is convex if, and only if, $\mathrm{epi}(f)= \{ (x,y) \in X \times \mathbb{R} \mid x \in X, y \geq f(x) \}$ is convex.

In the following, we only work in finite dimension, and a useful fact is that convex functions are continuous:

Property 2: Let $X$ be a finite-dimensional normed vector space and $f : X \to \mathbb{R}$ be a convex function. Then $f$ is continuous.

Proof. Without loss of generality, suppose that $X = \mathbb{R}^m$. Let $(x_n)$ be a sequence converging to some $x \in X$ and let $C$ be a cube containing the sequence whose vertices are $\{c_1,\dots,c_r\}$.

If $y \in C$ then we can write $y= \sum\limits_{i=1}^r \lambda_i c_i$ with $\lambda_i \in [0,1]$ and $\sum\limits_{i=1}^r \lambda_i=1$. Therefore,

$\displaystyle f(y) \leq \sum\limits_{i=1}^r \lambda_i f(c_i) \leq M:= \max\limits_{1 \leq i \leq r} f(c_i)$

Thus, $f$ is bounded above by $M$ on $C$.

For all $n \geq 0$, there exist $a_n,b_n \in \partial C$ such that $x \in [x_n,a_n]$ and $x_n \in [x,b_n]$. Therefore, there exist $\lambda_n, \mu_n \in [0,1]$ such that

$x=(1-\lambda_n)x_n+ \lambda_n a_n \ \text{and} \ x_n= (1-\mu_n)x+\mu_nb_n,$

hence by convexity,

$\displaystyle \frac{f(x)- \lambda_nf(a_n)}{1-\lambda_n} \leq f(x_n) \leq (1-\mu_n) f(x)+ \mu_nf(b_n).$

To conclude that $f(x_n) \underset{n \to + \infty}{\longrightarrow} f(x)$, and so that $f$ is continuous, it is sufficient to notice that $\lambda_n,\mu_n \underset{n \to + \infty}{\longrightarrow} 0$. Indeed, $\lambda_n \geq 0$ and

$\displaystyle \lambda_n = \frac{\|x-x_n\|}{\|a_n-x_n\|} \leq \frac{\|x-x_n\|}{\|a_n-x\|-\|x-x_n\|} \leq \frac{\|x-x_n\|}{d(x,\partial C)-\|x-x_n\|} \underset{n \to + \infty}{\longrightarrow} 0.$

The same argument works for $(\mu_n)$. $\square$

Now we use theorem 1 and theorem 2 to prove two results in convex analysis:

Definition: Let $X$ be a normed vector space and $f : X \to \mathbb{R}$ be a function. We say that $f$ is affine if there exist $\zeta \in X^*$ and $c \in \mathbb{R}$ such that $f : x \mapsto \langle \zeta,x \rangle + c$.

Theorem 3: Let $X$ be a finite-dimensional normed space and $f : X \to \mathbb{R}$ be a function. Then $f$ is convex if, and only if, there exists a family of affine functions $\{ \varphi_{\alpha} \mid \alpha \in A\}$ such that $f = \sup\limits_{\alpha \in A} \varphi_{\alpha}$.

Proof. Suppose $f$ convex. According to properties 1 and 2, $\mathrm{epi}(f) \subset X \times \mathbb{R}$ is a closed convex subset whose interior is nonempty. Therefore, using theorem 2, there exists a family of affine hyperplanes $\{ H_{\xi} \mid \xi \in A \}$ where $A \subset (X \times \mathbb{R})^*$ and $H_{\xi} = \{x \in X \mid \langle \xi,x \rangle = c_{\xi} \}$ such that

$\mathrm{epi}(f)= \bigcap\limits_{\xi \in A} D(H_{\xi})$

where $D(H_{\xi})$ is a half-space determined by $H_{\xi}$.

Without loss of generality, we may suppose that

$\langle \xi,(x,y) \rangle \geq c_{\xi}, \ \forall x \in X, \ \forall y \in [f(x),+ \infty)$

(Otherwise, replace $\xi$ by $- \xi$.) According to Riesz representation theorem, there exist $\zeta \in X^*$ and $\alpha \in \mathbb{R}$ such that $\xi : (a,b) \mapsto - \langle \zeta,a \rangle + \alpha \cdot b$. Thus,

$- \langle \zeta,x \rangle + \alpha \cdot y \geq c_{\xi}, \ \forall x \in X, \ \forall y \in [f(x),+ \infty)$

If $\alpha<0$, we get a contradiction as $y \to + \infty$. If $\alpha=0$, we deduce that $\zeta$ is bounded above hence $\zeta=0$ and finally $\xi=0$, a contradiction with the fact that $H$ is a hyperplane. Therefore, $\alpha>0$ and we may suppose without loss of generality that $\alpha=1$ (otherwise, replace $\zeta$ with $\frac{1}{\alpha} \zeta$). So

$y \geq \langle \zeta,x \rangle+ c_{\xi}, \ \forall x \in X, \ \forall y \in [f(x),+ \infty)$

Hence

$f(x) \geq \langle \zeta,x \rangle + c_{\xi}, \ \forall x \in X$

Let $\varphi_{\xi} : x \mapsto \langle \zeta,x \rangle + c_{\xi}$. We just proved that $f \geq \sup\limits_{\xi \in A} \varphi_{\xi}$.

Let $y. Then $(x,y) \notin \mathrm{epi}(f)$ so there exists an affine hyperplane $H_{\xi}$ ($\xi \in A$) separating $\{(x,y)\}$ and $\mathrm{epi}(f)$, that is

$\langle \xi, (x,z) \geq c_{\xi} \geq \langle \xi,(x,y) \rangle, \ \forall z \in [f(x),+ \infty)$

$z \geq \langle \zeta,x \rangle+ c_{\xi} = \varphi_{\xi}(x) \geq y, \ \forall z \in [f(x),+ \infty)$

Therefore, $y \leq \sup\limits_{\xi \in A} \varphi_{\xi}(x)$ hence $f(x) \leq \sup\limits_{\xi \in A} \varphi_{\xi}(x)$ as $y \to f(x)$. Finally, we deduce that $f= \sup\limits_{\xi \in A} \varphi_{\xi}$.

The converse is clear, since the upper envelope of a family of convex functions is convex. $\square$

Theorem: (Jensen’s inequality) Let $\varphi : \mathbb{R}^n \to \mathbb{R}$ be a convex function and $g \in L^1([0,1],\mathbb{R}^n)$. Then

$\displaystyle \varphi \left( \int_0^1 g(t)dt \right) \leq \int_0^1 \varphi(g(t))dt.$

Lemma 1: Let $X$ be finite-dimensional normed vector space, $f : X \to \mathbb{R}$ be a convex function and $x \in X$. Then there exists $\zeta \in X^*$ such that

$f(y)-f(x) \geq \langle \zeta, y-x \rangle, \ \forall y \in X.$

Proof. According to properties 1 and 2, $\mathrm{epi}(f) \subset X \times \mathbb{R}$ is a closed convex set whose interior is nonempty. Using theorem 1, $\mathrm{epi}(f)$ admits a supporting hyperplane $H= \{z \in X \mid \langle \xi,z \rangle =c \}$ at $(x,f(x)) \in \partial \mathrm{epi}(f)$.

Without loss of generality, we may suppose that:

$\langle \xi, (y,z) \rangle \geq \langle \xi,(x,f(x)) \rangle, \ \forall y \in X, \ \forall z \in [f(y),+ \infty).$

(Otherwise, replace $\xi$ with $- \xi$.)  According to Riesz representation theorem, there exist $\zeta \in X^*$ and $\alpha \in \mathbb{R}$ such that $\xi : (a,b) \mapsto -\langle \zeta,a \rangle+ \alpha \cdot b$, hence

$\alpha \cdot (z-f(x)) \geq \langle \zeta,y-x \rangle, \ \forall y \in X, \ \forall z \in [f(x),+ \infty).$

If $\alpha<0$, we get a contradiction as $z \to + \infty$. If $\alpha=0$, then $\zeta$ is bounded above hence $\zeta=0$ and finally $\xi=0$ which is impossible since $H$ is a hyperplane. Therefore, $\alpha>0$.

Without loss of generality, we may suppose that $\alpha=0$ (otherwise, replace $\zeta$ with $\frac{1}{\alpha} \zeta$). We finally deduce

$f(y)-f(x) \geq \langle \zeta,y-x \rangle, \ \forall y \in X. \hspace{1cm} \square$

Proof of Jensen’s inequality. Let $\zeta$ be the linear functional given by the previous lemma for $\varphi$ at the point $\int_0^1 g(t)dt \in \mathbb{R}^n$, ie.

$\displaystyle \varphi(y)- \varphi \left( \int_0^1 g(t)dt \right) \geq \langle \zeta,y- \int_0^1 g(t)dt \rangle, \ \forall y \in \mathbb{R}^n.$

In particular,

$\displaystyle \varphi(g(t))- \varphi \left( \int_0^1 g(t)dt \right) \geq \langle \zeta, g(t)- \int_0^1 g(t)dt \rangle, \ \forall t \in [0,1].$

By integrating:

$\displaystyle \int_0^1 \varphi(g(t))dt- \varphi \left( \int_0^1 g(t)dt \right) \geq \int_0^1 \langle \zeta,g(t) \rangle dt - \langle \zeta, \int_0^1 g(t)dt \rangle=0.$

Hence

$\displaystyle \int_0^1 \varphi(g(t)) dt \geq \varphi \left( \int_0^1 g(t)dt \right). \hspace{1cm} \square$

Theorem 3 can be generalized in infinite dimension; however, $f$ has to be supposed lower semi-continuous to assure $\mathrm{epi}(f)$ be closed. Also, lemma 1 suggests to introduce the subdifferential:

$\partial f (x) = \{ \zeta \in X^* \mid \forall y \in X, \ f(y)-f(x) \geq \langle \zeta, y-x \rangle\}$

Then, lemma 1 just says that $\partial f(x) \neq \emptyset$. The subdifferential generalizes the differential in the sense that, if $f$ is differentiable, $\partial f(x)= \{ d_xf\}$. Lemma 1 can also be generalized in infinite dimension, but $x$ needs to be supposed a point of continuity of $f$.

For more information about the previous generalizations, see for example Francis Clarke’s book, Function Analysis, Calculus of Variations and Optimal Control.