Diagonalization is often presented as a useful tool for computing powers of matrices; however, if only few matrices are diagonalizable, the method might be not really powerful. Here, we prove that almost all matrices are diagonalizable over \mathbb{C} from different viewpoints.

Definition: Let P,Q \in k[X] be two polynomials. The resultant \mathrm{res}(P,Q) is defined as the product

\displaystyle p^{\deg(P)} q^{\deg(Q)} \prod\limits_{(x,y) \in \tilde{k}, \ P(x)=Q(y)=0} (x-y),

where \tilde{k} denote the algebraic closure of k and p,q denote the leading coefficients of P and Q respectively; in the case of multiple roots, the factors are repeated according to their multiplicities.

Notice that \mathrm{res}(P,Q) is a symmetric polynomial on the roots of P and Q in \tilde{k}. Therefore, according to the fundamental theorem of symmetric polynomials and to Vieta’s formulas, \mathrm{res}(P,Q) \in k.

Definition: The discriminant of a polynomial P \in k[X] is defined by \Delta(P)= \mathrm{res}(P,P').

Lemma 1: Let P,Q \in k[X] be two polynomials. Then \mathrm{res}(P,Q) \neq 0 if and only if P and Q are coprime.

Proof. If P and Q have a common divisor then clearly they have a common root in \tilde{k} hence \mathrm{res}(P,Q)=0. Conversely, suppose that P and Q are coprime. According to Bézout’s identity of polynomials, there exist two polynomials U,V \in k[X] such that

UP+VQ=1.

Because the equality still holds in \tilde{k}[X], we deduce that P and Q do not have any commom root in \tilde{k}, hence \mathrm{res}(P,Q) \neq 0. \square

We deduce easily the following corollary:

Lemma 2: Let P \in \mathbb{C}[X] be a polynomial. Then P has only simple roots if and only if \Delta(P) \neq 0.

Because \Delta(P) is a polynomial on the coefficients of P, we will able to deduce the following theorems from lemmas 3, 4 and 5:

Theorem 1: The set D \subset M_n(\mathbb{C}) of diagonalizable matrices is dense in M_n(\mathbb{C}).

Theorem 2: The set D \subset M_n(\mathbb{C}) of diagonalizable matrices is a set of full measure (ie. its complement has measure zero).

Lemma 3: Let M \in M_n(\mathbb{C}) be a matrix. If its characteristic polynomial P has only simple roots then M is diagonalizable.

Proof. Let \lambda_1, \dots, \lambda_n \in \mathbb{C} be the distinct eigenvalues of M (ie. the roots of P), and for every 1 \leq i \leq n, let v_i be an eigenvector associated to \lambda_i.

We show that (v_1, \dots, v_n) is a basis of \mathbb{C}^n by induction on n. For n=1, it is clear. Suppose that it is true for n-1 and let \alpha_1, \dots, \alpha_n \in \mathbb{C} be such that

\alpha_1 \cdot v_1 + \alpha_2 \cdot v_2+ \cdots + \alpha_n \cdot v_n=0,

Suppose by contradiction that one \alpha_i is non-zero, say \alpha_1. Then

\displaystyle v_1= - \frac{\alpha_2}{\alpha_1} \cdot v_2 - \cdots - \frac{\alpha_n}{\alpha_1} \cdot v_n,

hence

\displaystyle \lambda_1 \cdot v_1 = - \lambda_1 \frac{\alpha_2}{\alpha_1} \cdot v_2- \cdots - \frac{\alpha_n}{\alpha_1} \cdot v_n.

On the other hand,

\displaystyle \lambda_1 \cdot v_1= M \cdot v_1= - \lambda_2 \frac{\alpha_2}{\alpha_1} \cdot v_2- \cdots - \lambda_n \frac{\alpha_n}{\alpha_1} \cdot v_n.

We deduce

\displaystyle (\lambda_1- \lambda_2) \frac{\alpha_2}{\alpha_1} \cdot v_2 + \cdots + (\lambda_1- \lambda_n) \frac{\alpha_n}{\alpha_1} \cdot v_n=0.

Because \lambda_1 \neq \lambda_i when i \neq 1 and according to our induction hypothesis, we deduce that \alpha_2= \cdots = \alpha_n=0, hence \alpha_1 \cdot v_1=0 ie. \alpha_1=0: a contradiction. \square

Corollary 1: Let D \subset M_n(\mathbb{C}) be the set of diagonalizable matrices. Then \mathrm{int}(D) is the set of matrices whose eigenvalues are pairwise distinct.

Proof. For every matrix M, let P_M denote its characteristic polynomial. Let \varphi : M \mapsto \Delta(P_M). In particular, \varphi is a polynomial on the coefficient of M so \varphi is continuous. According to lemma 2 and 3, we deduce that the set \varphi^{-1}(\mathbb{C} \backslash \{0\}) of diagonalizable matrices whose eigenvalues are pairwise discinct is open in D.

Let M be a diagonalizable matrix with an eigenvalue \lambda of multiplicity at least two. There exist P \in Gl_n(\mathbb{C}) and two diagonal matrices D_1,D_2 such that

M=P \left( \begin{matrix} D_1 & 0 & 0 \\ 0 & \begin{matrix} \lambda & 0 \\ 0 & \lambda \end{matrix} & 0 \\ 0 & 0 & D_2 \end{matrix} \right)P^{-1}.

For all \epsilon>0, let

M_{\epsilon} = P \left( \begin{matrix} D_1 & 0 & 0 \\ 0 & \begin{matrix} \lambda & \epsilon \\ 0 & \lambda \end{matrix} & 0 \\ 0 & 0 & D_2 \end{matrix} \right) P^{-1}.

Suppose that M_{\epsilon} is diagonalizable; in particular, there exists a polynomial Q \in \mathbb{C}[X] whose roots are simple such that Q(M_{\epsilon})=0. Clearly, the matrix \left( \begin{matrix} \lambda & \epsilon \\ 0 & \lambda \end{matrix} \right) vanishes the polynomial Q, so this matrix is diagonalizable, that is there exists S \in Gl_n(\mathbb{C}) such that

M_{\epsilon}= S \left( \begin{matrix} \lambda & 0 \\ 0 & \lambda \end{matrix} \right) S^{-1} = \lambda \cdot \mathrm{I_n},

hence \epsilon =0. Therefore, M_{\epsilon} is diagonalizable if and only if \epsilon=0 and M_{\epsilon} \underset{\epsilon \to 0}{\longrightarrow} M, so D \backslash \varphi^{-1}(\mathbb{C} \backslash \{0\}) \subset \partial D.

Finally, we deduce that \mathrm{int}(D) = \varphi^{-1}(\mathbb{C} \backslash \{0\}). \square

Lemma 4: Let P \in \mathbb{C}[X_1,\dots,X_n] be a non-zero polynomial. Then P^{-1}(0) is nowhere dense in \mathbb{C}^n.

Proof. Suppose that P vanishes on an open ball B(x_0,r) \subset \mathbb{C}^n and let x_1 \in \mathbb{C}^n. The restriction of P on the straight line passing through x_0 and x_1 coincides with a polynomial p in one variable. Because P vanishes on B(x_0,r), we deduce that p has infinitely many roots hence p=0 and P(x_1)=0. Therefore, P=0. \square

Lemma 5: Let P \in \mathbb{R}[X_1,\dots,X_n] be a non-zero polynomial. Then P^{-1}(0) has measure zero in \mathbb{R}^n.

Proof. We prove by induction on n that the null set of a non-zero polynomial in \mathbb{R}[X_1, \dots, X_n] has measure zero. For n=1, it is clear since P^{-1}(0) is finite. Suppose the result holds for n-1 and let P \in \mathbb{R} [X_1,\dots ,X_n].

Let I = \{x_1 \in \mathbb{R} \mid P(x_1, \cdot)=0\}. It is possible to write P as

\displaystyle P(x_1,x_2, \dots, x_n)= \sum\limits_{k=0}^m P_k(x_1) \cdot m_k(x_2, \dots, x_n),

where P_k \in \mathbb{R}[X_1], m_k \in \mathbb{R}[X_2,\dots,X_n] are monomials and P_m \neq 0. Notice that P(x_1, \cdot)=0 implies P_m(x_1)=0 so I is finite since P_m has only finitely many roots. Then

\begin{array}{cl} \mu_n(P^{-1}(0)) & \displaystyle = \int_{\mathbb{R}^n} \mathbf{1}_{P^{-1}(0)} (x_1,\dots,x_n) dx_1 \dots dx_n \\ \\ & \displaystyle = \int_{\mathbb{R}} \int_{\mathbb{R}^{n-1}} \mathbf{1}_{P^{-1}(0)} (x_1,x_2,\dots,x_n) dx_1dx_2 \dots dx_n \\ \\ & \displaystyle = \int_{\mathbb{R}} \underset{:=f(x_1)}{\underbrace{ \mu_{n-1} \left( \{ (x_2, \dots ,x_n) \in \mathbb{R}^{n-1} \mid P(x_1,x_2, \dots,x_n)=0 \} \right) }} dx_1 \\ \\ & \displaystyle = \int_I f(x_1)dx_1 + \int_{\mathbb{R} \backslash I} f(x_1)dx_1 \end{array}

Because \mu(I)=0, we deduce that \displaystyle \int_I f(x_1)dx_1=0.

Then, if x_1 \notin I we have P(x_1, \cdot) \neq 0, hence

\mu_{n-1} \left( \{ (x_2,\dots,x_n) \in \mathbb{R}^{n-1} \mid P(x_1,x_2,\dots,x_n) =0 \} \right)=0,

by our induction hypothesis. Therefore, \displaystyle \int_{\mathbb{R} \backslash I} f(x_1)dx_1=0 and finally \mu_n(P^{-1}(0))=0. \square

Proof of theorems 1 and 2. For every matrix M \in M_n(\mathbb{C}), let P_M \in \mathbb{C}[X] denote its characteristic polynomial. Let \varphi : M \mapsto \Delta(P_M). Because \varphi is a polynomial on the coefficients of M, we deduce from corollary 1 and lemmas 3 to 5 that \mathrm{int}(D)= \varphi^{-1}(\mathbb{C} \backslash \{0\}) is dense in M_n(\mathbb{C}) and is a set of full measure. \square

In fact, theorem 1 (resp. lemma 3) is a consequence of theorem 2 (resp. lemma 4).

Notice that there are no theorems similar to theorems 1 and 2 when we deal with diagonalization over \mathbb{R}. Indeed, the characteristic polynomial of M= \left( \begin{matrix} 0 & -1 \\ 1 & 0 \end{matrix} \right) has a negative discriminant, so there exists a neighborhood V of M such that the discriminant is negative on V; in particular, the matrices in V are not diagonalizable.

However, it is possible to determine the closure of the set of diagonalizable matrices in M_n(\mathbb{R}):

Property: The closure of the set of diagonalizable matrices (over \mathbb{R}) in M_n(\mathbb{R}) is the set of triangularizable matrices.

Proof. Let M \in M_n(\mathbb{R}) be a triangularizable matrix. So there exist P \in Gl_n(\mathbb{R}) and \lambda_1, \dots, \lambda_n \in \mathbb{R} such that

M=P \left( \begin{matrix} \lambda_1 & & \ast \\ & \ddots & \\ 0 & & \lambda_n \end{matrix} \right) P^{-1}.

For all \epsilon>0, let 0<\epsilon_1,\dots, \epsilon_n<\epsilon be such that \lambda_i+ \epsilon_i \neq \lambda_j + \epsilon_j when i \neq j. Therefore, the matrix

M= P \left( \begin{matrix} \lambda_1+ \epsilon_1 & & \ast \\ & \ddots & \\ 0 & & \lambda_n + \epsilon_n \end{matrix} \right) P^{-1}

is diagonalisable according to lemma 3. Because M_{\epsilon} \underset{\epsilon \to 0}{\longrightarrow} M, we deduce that M belongs to the closure of the set of diagonalizable matrices.

Conversely, let M be the limit of diagonalizable matrices (M_j). Let P_j (resp. P) be the characteristic polynomial of M_j (resp. M). We can write

\displaystyle P_j = \prod\limits_{k=1}^n (X- \lambda_{j,k} ) and \displaystyle P= \prod\limits_{k=1}^n (X-\lambda_k),

where \lambda_{j,k} \in \mathbb{R} and \lambda_k \in \mathbb{C}.

From now on, we fix some 1 \leq k \leq n. For all s \geq 0, let 1 \leq i_s \leq n be such that |\lambda_k- \lambda_{s,i_s}| = \min\limits_{1 \leq j \leq n} |\lambda_k-\lambda_{s,j}|. Therefore,

\displaystyle |P_s(\lambda_k)| = \prod\limits_{j=1}^n |\lambda_k- \lambda_{s,j}| \geq |\lambda_k- \lambda_{s,i_s}|^n

hence

\displaystyle |\lambda_k-\lambda_{s,i_s}| \leq \sqrt[n]{|P_s(\lambda_k)|} \underset{s \to + \infty}{\longrightarrow} \sqrt[n]{|P(\lambda_k)|}=0.

(We used the fact that the coefficients of a characteristic polynomial are polynomials on the coefficient of the matrix to justify that P_k \to P.)

Therefore, the complex roots of P are limits of roots of the P_n so \lambda_1,\dots,\lambda_n \in \mathbb{R}. We deduce that M is triangularizable. \square

Now, we deal with a probabilistic viewpoint; intuitively, the following theorem states that a matrix in M_n(\mathbb{Z}) is diagonalizable over \mathbb{C} with a probability of one:

Theorem 3: Let T_n(k) be the number of matrices in M_n(\mathbb{Z}) whose coefficients are in \{ -k , \dots ,k\} and R_n(k) be the number of similar matrices whose (complex) eigenvalues are not pairwise distinct. Then

\displaystyle \lim\limits_{k \to + \infty} \frac{R_n(k)}{T_n(k)} = 0.

Proof. Let \varphi : M \mapsto \Delta(P_M) where P_M denotes the characteristic polynomial of M. We already know that \varphi is a polynomial g_0(x_1, \dots , x_{n^2}) on the coefficients of M and that \varphi(M)=0 if and only if M has a complex eigenvalue with multiplicity at least two. Therefore, if

S= \{ (x_1, \dots, x_{n^2}) \in \{-k, \dots,k\}^{n^2} \mid g_0(x_1,\dots, x_{n^2})=0 \}

then R_n(k)= |S|.

By induction, we define g_i \in \mathbb{Z}[X_{i+1}, \dots, X_{n^2}] for 1 \leq i \leq n^2 as the leading coefficient of g_{i-1} \in \mathbb{Z} [X_{i+1}, \dots, X_{n^2}] [X_i]. In particular, notice that g_{n^2} \in \mathbb{Z} \backslash \{0\} and \deg(g_i) \leq m. For all i \geq 1, let us introduce the set

\begin{array}{lc} S_i = & \{ (x_1,\dots, x_{n^2}) \in S \mid g_0(x_1,\dots,x_{n^2})= \cdots = g_{i-1}(x_i,\dots,x_{n^2})=0, \\ & g_i(x_{i+1},\dots,x_{n^2}) \neq 0 \} \end{array}

Then \displaystyle S= \coprod\limits_{i=1}^{n^2} S_i so \displaystyle |S|= \sum\limits_{i=1}^{n^2} |S_i|.

If g_i(x_{i+1}, \dots, x_{n^2}) \neq 0, the polynomial g_{i-1}( \cdot,x_{i+1}, \dots,x_{n^2}) \in \mathbb{Z}[X_i] is non-zero so it has at most \deg(g_{i-1}) \leq m roots. We deduce that

\displaystyle |S_i| \leq m(2k+1)^{n^2-1}.

(We have at least m choices for x_i and 2k+1 choices for each x_j with j \neq i.)

Therefore, |S| \leq mn^2(2k+1)^{n^2-1}, and because T_n(k)= (2k+1)^{n^2}:

\displaystyle \frac{R_n(k)}{T_n(k)} \leq \frac{mn^2}{2k+1} \underset{k \to + \infty}{\longrightarrow} 0. \hspace{1cm} \square

Again, theorem 3 does not hold for diagonalizations over \mathbb{R} or \mathbb{Q}. For example,

\displaystyle \lim\limits_{k \to + \infty} \frac{D_2(k)}{T_2(k)} = \frac{49}{72}

over \mathbb{R}, and even

\displaystyle \lim\limits_{k \to + \infty} \frac{D_2(k)}{T_2(k)} =0

over \mathbb{Q}, where D_2(k) is the number of diagonalizable (over \mathbb{R} or \mathbb{Q}) matrices whose coefficients are in \{-k, \dots,k \}.

More information can be found in the paper The probability that a matrix of integers is diagonalizable.

Advertisements