Definition: Let X be a vector space. An affine hyperplane H is a level set of some linear functional \zeta \in X^*, ie. H= \{x \in X \mid \langle \zeta,x \rangle=c \} with c \in \mathbb{R}.

In particular, such an hyperplane disconects X into two half-spaces \{x \in X \mid \langle \zeta,x \rangle \leq c\} and \{x \in X \mid \langle \zeta,x \rangle \geq c\}, and we say that H separates two subsets if they are included in two different half-spaces.

Theorem: (separation) Let X be a vector space and C_1,C_2 \subset X be two nonempty disjoint convex subsets. If C_1 is open then there exists an affine hyperplane separating C_1 and C_2.

Sketch of proof. Let x_0 \in C_1 and y_0 \in C_2. Define z=x_0-y_0 and C=C_2-C_1+z, and consider the following Minkowski functionnal:

g : \left\{ \begin{array}{ccc} X & \to & [0,+ \infty) \\ x & \mapsto & \inf \{ \lambda >0 \mid x \in \lambda C \} \end{array} \right.

Notice that g is well-defined because C is open. Then, define the linear functional:

\xi : \left\{ \begin{array}{ccc} \mathbb{R}z & \to & \mathbb{R} \\ tz & \mapsto & t \end{array} \right.

If t \leq 0, clearly \langle \xi, tz \rangle =t\leq g(tz). If t>0, then \langle \xi ,tz \rangle =t \leq tg(z)=g(tz) since z \notin C implies g(z) \geq 1. Therefore \xi \leq p.

Now, using Hahn-Banach theorem, \xi can be extended into a linear functional \zeta : X \to \mathbb{R} satisfying \zeta \leq p, hence

\displaystyle \langle \zeta,x \rangle- \langle \zeta,y \rangle + \underset{=1}{\underbrace{\langle \zeta,z \rangle}} \leq g( \underset{\in C}{\underbrace{x-y+z}} ) \leq 1, \ \forall x \in C_1, \ \forall y \in C_2.

That is,

\langle \zeta,x \rangle \leq \langle \zeta,y \rangle, \ \forall x \in C_1, \ \forall y \in C_2. \hspace{1cm} \square

Definition: Let X be a normed vector space, C \subset X be a closed convex subset and x_0 \in \partial C. A supporting hyperplane at x_0 is an affine hyperplane separating \{x_0\} and C.

Our main geometrical results are:

Theorem 1: Let X be a normed space and C \subset X be a closed subset whose interior is nonempty. Then C is convex if, and only if, C admits a supporting hyperplane at any x \in \partial C.

Proof. Suppose that C is convex and let x \in \partial C. Applying separation theorem, there exists an affine hyperplane H separating x and \mathrm{int}(C). Then H is a supporting hyperplane at x.

Conversely, suppose that C admits a supporting hyperplane at any point of \partial C and suppose by contradiction that C is not convex. Therefore, there exist x,y \in \partial C and \lambda \in [0,1] such that t:=(1-\lambda)x+ \lambda y \notin C. Let z \in \mathrm{int}(C) and let \Delta be the triangle whose vertices are x, y and z.

If H is a supporting hyperplane at x, then \Delta \cap H is a straight line separating t and z and therefore y and z. We deduce that H itself separates y and z (two elements of C), a contradiction. \square

Theorem 2: Let X be a normed vector space and C \subset X be a closed convex subset whose interior is nonempty. Then the intersection of the half-spaces containing C coincides with C.

Proof. Let M be the given intersection. Clearly, C \subset M. Moreover, if x \notin C then separation theorem can be used to separate \{x\} and \mathrm{int}(C) so that x \notin M. So M \subset C.

Notice that the converse is true: an intersection of closed convex sets is closed and convex. \square

The result allowing us to link convex sets and convex functions is the following (the (classic and easy) proof is left to the reader):

Property 1: Let X be a vector space and f : X \to \mathbb{R} be a function. Then f is convex if, and only if, \mathrm{epi}(f)= \{ (x,y) \in X \times \mathbb{R} \mid x \in X, y \geq f(x) \} is convex.

In the following, we only work in finite dimension, and a useful fact is that convex functions are continuous:

Property 2: Let X be a finite-dimensional normed vector space and f : X \to \mathbb{R} be a convex function. Then f is continuous.

Proof. Without loss of generality, suppose that X = \mathbb{R}^m. Let (x_n) be a sequence converging to some x \in X and let C be a cube containing the sequence whose vertices are \{c_1,\dots,c_r\}.

If y \in C then we can write y= \sum\limits_{i=1}^r \lambda_i c_i with \lambda_i \in [0,1] and \sum\limits_{i=1}^r \lambda_i=1. Therefore,

\displaystyle f(y) \leq \sum\limits_{i=1}^r \lambda_i f(c_i) \leq M:= \max\limits_{1 \leq i \leq r} f(c_i)

Thus, f is bounded above by M on C.

For all n \geq 0, there exist a_n,b_n \in \partial C such that x \in [x_n,a_n] and x_n \in [x,b_n]. Therefore, there exist \lambda_n, \mu_n \in [0,1] such that

x=(1-\lambda_n)x_n+ \lambda_n a_n \ \text{and} \ x_n= (1-\mu_n)x+\mu_nb_n,

hence by convexity,

\displaystyle \frac{f(x)- \lambda_nf(a_n)}{1-\lambda_n} \leq f(x_n) \leq (1-\mu_n) f(x)+ \mu_nf(b_n).

To conclude that f(x_n) \underset{n \to + \infty}{\longrightarrow} f(x), and so that f is continuous, it is sufficient to notice that \lambda_n,\mu_n \underset{n \to + \infty}{\longrightarrow} 0. Indeed, \lambda_n \geq 0 and

\displaystyle \lambda_n = \frac{\|x-x_n\|}{\|a_n-x_n\|} \leq \frac{\|x-x_n\|}{\|a_n-x\|-\|x-x_n\|} \leq \frac{\|x-x_n\|}{d(x,\partial C)-\|x-x_n\|} \underset{n \to + \infty}{\longrightarrow} 0.

The same argument works for (\mu_n). \square

Now we use theorem 1 and theorem 2 to prove two results in convex analysis:

Definition: Let X be a normed vector space and f : X \to \mathbb{R} be a function. We say that f is affine if there exist \zeta \in X^* and c \in \mathbb{R} such that f : x \mapsto \langle \zeta,x \rangle + c.

Theorem 3: Let X be a finite-dimensional normed space and f : X \to \mathbb{R} be a function. Then f is convex if, and only if, there exists a family of affine functions \{ \varphi_{\alpha} \mid \alpha \in A\} such that f = \sup\limits_{\alpha \in A} \varphi_{\alpha}.

Proof. Suppose f convex. According to properties 1 and 2, \mathrm{epi}(f) \subset X \times \mathbb{R} is a closed convex subset whose interior is nonempty. Therefore, using theorem 2, there exists a family of affine hyperplanes \{ H_{\xi} \mid \xi \in A \} where A \subset (X \times \mathbb{R})^* and H_{\xi} = \{x \in X \mid \langle \xi,x \rangle = c_{\xi} \} such that

\mathrm{epi}(f)= \bigcap\limits_{\xi \in A} D(H_{\xi})

where D(H_{\xi}) is a half-space determined by H_{\xi}.

Without loss of generality, we may suppose that

\langle \xi,(x,y) \rangle \geq c_{\xi}, \ \forall x \in X, \ \forall y \in [f(x),+ \infty)

(Otherwise, replace \xi by - \xi.) According to Riesz representation theorem, there exist \zeta \in X^* and \alpha \in \mathbb{R} such that \xi : (a,b) \mapsto - \langle \zeta,a \rangle + \alpha \cdot b. Thus,

- \langle \zeta,x \rangle + \alpha \cdot y \geq c_{\xi}, \ \forall x \in X, \ \forall y \in [f(x),+ \infty)

If \alpha<0, we get a contradiction as y \to + \infty. If \alpha=0, we deduce that \zeta is bounded above hence \zeta=0 and finally \xi=0, a contradiction with the fact that H is a hyperplane. Therefore, \alpha>0 and we may suppose without loss of generality that \alpha=1 (otherwise, replace \zeta with \frac{1}{\alpha} \zeta). So

y \geq \langle \zeta,x \rangle+ c_{\xi}, \ \forall x \in X, \ \forall y \in [f(x),+ \infty)

Hence

f(x) \geq \langle \zeta,x \rangle + c_{\xi}, \ \forall x \in X

Let \varphi_{\xi} : x \mapsto \langle \zeta,x \rangle + c_{\xi}. We just proved that f \geq \sup\limits_{\xi \in A} \varphi_{\xi}.

Let y<f(x). Then (x,y) \notin \mathrm{epi}(f) so there exists an affine hyperplane H_{\xi} (\xi \in A) separating \{(x,y)\} and \mathrm{epi}(f), that is

\langle \xi, (x,z) \geq c_{\xi} \geq \langle \xi,(x,y) \rangle, \ \forall z \in [f(x),+ \infty)

z \geq \langle \zeta,x \rangle+ c_{\xi} = \varphi_{\xi}(x) \geq y, \ \forall z \in [f(x),+ \infty)

Therefore, y \leq \sup\limits_{\xi \in A} \varphi_{\xi}(x) hence f(x) \leq \sup\limits_{\xi \in A} \varphi_{\xi}(x) as y \to f(x). Finally, we deduce that f= \sup\limits_{\xi \in A} \varphi_{\xi}.

The converse is clear, since the upper envelope of a family of convex functions is convex. \square

Theorem: (Jensen’s inequality) Let \varphi : \mathbb{R}^n \to \mathbb{R} be a convex function and g \in L^1([0,1],\mathbb{R}^n). Then

\displaystyle \varphi \left( \int_0^1 g(t)dt \right) \leq \int_0^1 \varphi(g(t))dt.

Lemma 1: Let X be finite-dimensional normed vector space, f : X \to \mathbb{R} be a convex function and x \in X. Then there exists \zeta \in X^* such that

f(y)-f(x) \geq \langle \zeta, y-x \rangle, \ \forall y \in X.

Proof. According to properties 1 and 2, \mathrm{epi}(f) \subset X \times \mathbb{R} is a closed convex set whose interior is nonempty. Using theorem 1, \mathrm{epi}(f) admits a supporting hyperplane H= \{z \in X \mid \langle \xi,z \rangle =c \} at (x,f(x)) \in \partial \mathrm{epi}(f).

Without loss of generality, we may suppose that:

\langle \xi, (y,z) \rangle \geq \langle \xi,(x,f(x)) \rangle, \ \forall y \in X, \ \forall z \in [f(y),+ \infty).

(Otherwise, replace \xi with - \xi.)  According to Riesz representation theorem, there exist \zeta \in X^* and \alpha \in \mathbb{R} such that \xi : (a,b) \mapsto -\langle \zeta,a \rangle+ \alpha \cdot b, hence

\alpha \cdot (z-f(x)) \geq \langle \zeta,y-x \rangle, \ \forall y \in X, \ \forall z \in [f(x),+ \infty).

If \alpha<0, we get a contradiction as z \to + \infty. If \alpha=0, then \zeta is bounded above hence \zeta=0 and finally \xi=0 which is impossible since H is a hyperplane. Therefore, \alpha>0.

Without loss of generality, we may suppose that \alpha=0 (otherwise, replace \zeta with \frac{1}{\alpha} \zeta). We finally deduce

f(y)-f(x) \geq \langle \zeta,y-x \rangle, \ \forall y \in X. \hspace{1cm} \square

Proof of Jensen’s inequality. Let \zeta be the linear functional given by the previous lemma for \varphi at the point \int_0^1 g(t)dt \in \mathbb{R}^n, ie.

\displaystyle \varphi(y)- \varphi \left( \int_0^1 g(t)dt \right) \geq \langle \zeta,y- \int_0^1 g(t)dt \rangle, \ \forall y \in \mathbb{R}^n.

In particular,

\displaystyle \varphi(g(t))- \varphi \left( \int_0^1 g(t)dt \right) \geq \langle \zeta, g(t)- \int_0^1 g(t)dt \rangle, \ \forall t \in [0,1].

By integrating:

\displaystyle \int_0^1 \varphi(g(t))dt- \varphi \left( \int_0^1 g(t)dt \right) \geq \int_0^1 \langle \zeta,g(t) \rangle dt - \langle \zeta, \int_0^1 g(t)dt \rangle=0.

Hence

\displaystyle \int_0^1 \varphi(g(t)) dt \geq \varphi \left( \int_0^1 g(t)dt \right). \hspace{1cm} \square

Theorem 3 can be generalized in infinite dimension; however, f has to be supposed lower semi-continuous to assure \mathrm{epi}(f) be closed. Also, lemma 1 suggests to introduce the subdifferential:

\partial f (x) = \{ \zeta \in X^* \mid \forall y \in X, \ f(y)-f(x) \geq \langle \zeta, y-x \rangle\}

Then, lemma 1 just says that \partial f(x) \neq \emptyset. The subdifferential generalizes the differential in the sense that, if f is differentiable, \partial f(x)= \{ d_xf\}. Lemma 1 can also be generalized in infinite dimension, but x needs to be supposed a point of continuity of f.

For more information about the previous generalizations, see for example Francis Clarke’s book, Function Analysis, Calculus of Variations and Optimal Control.

Advertisements