Diagonalization is often presented as a useful tool for computing powers of matrices; however, if only few matrices are diagonalizable, the method might be not really powerful. Here, we prove that almost all matrices are diagonalizable over $\mathbb{C}$ from different viewpoints.

Definition: Let $P,Q \in k[X]$ be two polynomials. The resultant $\mathrm{res}(P,Q)$ is defined as the product

$\displaystyle p^{\deg(P)} q^{\deg(Q)} \prod\limits_{(x,y) \in \tilde{k}, \ P(x)=Q(y)=0} (x-y)$,

where $\tilde{k}$ denote the algebraic closure of $k$ and $p,q$ denote the leading coefficients of $P$ and $Q$ respectively; in the case of multiple roots, the factors are repeated according to their multiplicities.

Notice that $\mathrm{res}(P,Q)$ is a symmetric polynomial on the roots of $P$ and $Q$ in $\tilde{k}$. Therefore, according to the fundamental theorem of symmetric polynomials and to Vieta’s formulas, $\mathrm{res}(P,Q) \in k$.

Definition: The discriminant of a polynomial $P \in k[X]$ is defined by $\Delta(P)= \mathrm{res}(P,P')$.

Lemma 1: Let $P,Q \in k[X]$ be two polynomials. Then $\mathrm{res}(P,Q) \neq 0$ if and only if $P$ and $Q$ are coprime.

Proof. If $P$ and $Q$ have a common divisor then clearly they have a common root in $\tilde{k}$ hence $\mathrm{res}(P,Q)=0$. Conversely, suppose that $P$ and $Q$ are coprime. According to Bézout’s identity of polynomials, there exist two polynomials $U,V \in k[X]$ such that

$UP+VQ=1$.

Because the equality still holds in $\tilde{k}[X]$, we deduce that $P$ and $Q$ do not have any commom root in $\tilde{k}$, hence $\mathrm{res}(P,Q) \neq 0$. $\square$

We deduce easily the following corollary:

Lemma 2: Let $P \in \mathbb{C}[X]$ be a polynomial. Then $P$ has only simple roots if and only if $\Delta(P) \neq 0$.

Because $\Delta(P)$ is a polynomial on the coefficients of $P$, we will able to deduce the following theorems from lemmas 3, 4 and 5:

Theorem 1: The set $D \subset M_n(\mathbb{C})$ of diagonalizable matrices is dense in $M_n(\mathbb{C})$.

Theorem 2: The set $D \subset M_n(\mathbb{C})$ of diagonalizable matrices is a set of full measure (ie. its complement has measure zero).

Lemma 3: Let $M \in M_n(\mathbb{C})$ be a matrix. If its characteristic polynomial $P$ has only simple roots then $M$ is diagonalizable.

Proof. Let $\lambda_1, \dots, \lambda_n \in \mathbb{C}$ be the distinct eigenvalues of $M$ (ie. the roots of $P$), and for every $1 \leq i \leq n$, let $v_i$ be an eigenvector associated to $\lambda_i$.

We show that $(v_1, \dots, v_n)$ is a basis of $\mathbb{C}^n$ by induction on $n$. For $n=1$, it is clear. Suppose that it is true for $n-1$ and let $\alpha_1, \dots, \alpha_n \in \mathbb{C}$ be such that

$\alpha_1 \cdot v_1 + \alpha_2 \cdot v_2+ \cdots + \alpha_n \cdot v_n=0$,

Suppose by contradiction that one $\alpha_i$ is non-zero, say $\alpha_1$. Then

$\displaystyle v_1= - \frac{\alpha_2}{\alpha_1} \cdot v_2 - \cdots - \frac{\alpha_n}{\alpha_1} \cdot v_n$,

hence

$\displaystyle \lambda_1 \cdot v_1 = - \lambda_1 \frac{\alpha_2}{\alpha_1} \cdot v_2- \cdots - \frac{\alpha_n}{\alpha_1} \cdot v_n$.

On the other hand,

$\displaystyle \lambda_1 \cdot v_1= M \cdot v_1= - \lambda_2 \frac{\alpha_2}{\alpha_1} \cdot v_2- \cdots - \lambda_n \frac{\alpha_n}{\alpha_1} \cdot v_n$.

We deduce

$\displaystyle (\lambda_1- \lambda_2) \frac{\alpha_2}{\alpha_1} \cdot v_2 + \cdots + (\lambda_1- \lambda_n) \frac{\alpha_n}{\alpha_1} \cdot v_n=0$.

Because $\lambda_1 \neq \lambda_i$ when $i \neq 1$ and according to our induction hypothesis, we deduce that $\alpha_2= \cdots = \alpha_n=0$, hence $\alpha_1 \cdot v_1=0$ ie. $\alpha_1=0$: a contradiction. $\square$

Corollary 1: Let $D \subset M_n(\mathbb{C})$ be the set of diagonalizable matrices. Then $\mathrm{int}(D)$ is the set of matrices whose eigenvalues are pairwise distinct.

Proof. For every matrix $M$, let $P_M$ denote its characteristic polynomial. Let $\varphi : M \mapsto \Delta(P_M)$. In particular, $\varphi$ is a polynomial on the coefficient of $M$ so $\varphi$ is continuous. According to lemma 2 and 3, we deduce that the set $\varphi^{-1}(\mathbb{C} \backslash \{0\})$ of diagonalizable matrices whose eigenvalues are pairwise discinct is open in $D$.

Let $M$ be a diagonalizable matrix with an eigenvalue $\lambda$ of multiplicity at least two. There exist $P \in Gl_n(\mathbb{C})$ and two diagonal matrices $D_1,D_2$ such that

$M=P \left( \begin{matrix} D_1 & 0 & 0 \\ 0 & \begin{matrix} \lambda & 0 \\ 0 & \lambda \end{matrix} & 0 \\ 0 & 0 & D_2 \end{matrix} \right)P^{-1}$.

For all $\epsilon>0$, let

$M_{\epsilon} = P \left( \begin{matrix} D_1 & 0 & 0 \\ 0 & \begin{matrix} \lambda & \epsilon \\ 0 & \lambda \end{matrix} & 0 \\ 0 & 0 & D_2 \end{matrix} \right) P^{-1}$.

Suppose that $M_{\epsilon}$ is diagonalizable; in particular, there exists a polynomial $Q \in \mathbb{C}[X]$ whose roots are simple such that $Q(M_{\epsilon})=0$. Clearly, the matrix $\left( \begin{matrix} \lambda & \epsilon \\ 0 & \lambda \end{matrix} \right)$ vanishes the polynomial $Q$, so this matrix is diagonalizable, that is there exists $S \in Gl_n(\mathbb{C})$ such that

$M_{\epsilon}= S \left( \begin{matrix} \lambda & 0 \\ 0 & \lambda \end{matrix} \right) S^{-1} = \lambda \cdot \mathrm{I_n}$,

hence $\epsilon =0$. Therefore, $M_{\epsilon}$ is diagonalizable if and only if $\epsilon=0$ and $M_{\epsilon} \underset{\epsilon \to 0}{\longrightarrow} M$, so $D \backslash \varphi^{-1}(\mathbb{C} \backslash \{0\}) \subset \partial D$.

Finally, we deduce that $\mathrm{int}(D) = \varphi^{-1}(\mathbb{C} \backslash \{0\})$. $\square$

Lemma 4: Let $P \in \mathbb{C}[X_1,\dots,X_n]$ be a non-zero polynomial. Then $P^{-1}(0)$ is nowhere dense in $\mathbb{C}^n$.

Proof. Suppose that $P$ vanishes on an open ball $B(x_0,r) \subset \mathbb{C}^n$ and let $x_1 \in \mathbb{C}^n$. The restriction of $P$ on the straight line passing through $x_0$ and $x_1$ coincides with a polynomial $p$ in one variable. Because $P$ vanishes on $B(x_0,r)$, we deduce that $p$ has infinitely many roots hence $p=0$ and $P(x_1)=0$. Therefore, $P=0$. $\square$

Lemma 5: Let $P \in \mathbb{R}[X_1,\dots,X_n]$ be a non-zero polynomial. Then $P^{-1}(0)$ has measure zero in $\mathbb{R}^n$.

Proof. We prove by induction on $n$ that the null set of a non-zero polynomial in $\mathbb{R}[X_1, \dots, X_n]$ has measure zero. For $n=1$, it is clear since $P^{-1}(0)$ is finite. Suppose the result holds for $n-1$ and let $P \in \mathbb{R} [X_1,\dots ,X_n]$.

Let $I = \{x_1 \in \mathbb{R} \mid P(x_1, \cdot)=0\}$. It is possible to write $P$ as

$\displaystyle P(x_1,x_2, \dots, x_n)= \sum\limits_{k=0}^m P_k(x_1) \cdot m_k(x_2, \dots, x_n)$,

where $P_k \in \mathbb{R}[X_1]$, $m_k \in \mathbb{R}[X_2,\dots,X_n]$ are monomials and $P_m \neq 0$. Notice that $P(x_1, \cdot)=0$ implies $P_m(x_1)=0$ so $I$ is finite since $P_m$ has only finitely many roots. Then

$\begin{array}{cl} \mu_n(P^{-1}(0)) & \displaystyle = \int_{\mathbb{R}^n} \mathbf{1}_{P^{-1}(0)} (x_1,\dots,x_n) dx_1 \dots dx_n \\ \\ & \displaystyle = \int_{\mathbb{R}} \int_{\mathbb{R}^{n-1}} \mathbf{1}_{P^{-1}(0)} (x_1,x_2,\dots,x_n) dx_1dx_2 \dots dx_n \\ \\ & \displaystyle = \int_{\mathbb{R}} \underset{:=f(x_1)}{\underbrace{ \mu_{n-1} \left( \{ (x_2, \dots ,x_n) \in \mathbb{R}^{n-1} \mid P(x_1,x_2, \dots,x_n)=0 \} \right) }} dx_1 \\ \\ & \displaystyle = \int_I f(x_1)dx_1 + \int_{\mathbb{R} \backslash I} f(x_1)dx_1 \end{array}$

Because $\mu(I)=0$, we deduce that $\displaystyle \int_I f(x_1)dx_1=0$.

Then, if $x_1 \notin I$ we have $P(x_1, \cdot) \neq 0$, hence

$\mu_{n-1} \left( \{ (x_2,\dots,x_n) \in \mathbb{R}^{n-1} \mid P(x_1,x_2,\dots,x_n) =0 \} \right)=0$,

by our induction hypothesis. Therefore, $\displaystyle \int_{\mathbb{R} \backslash I} f(x_1)dx_1=0$ and finally $\mu_n(P^{-1}(0))=0$. $\square$

Proof of theorems 1 and 2. For every matrix $M \in M_n(\mathbb{C})$, let $P_M \in \mathbb{C}[X]$ denote its characteristic polynomial. Let $\varphi : M \mapsto \Delta(P_M)$. Because $\varphi$ is a polynomial on the coefficients of $M$, we deduce from corollary 1 and lemmas 3 to 5 that $\mathrm{int}(D)= \varphi^{-1}(\mathbb{C} \backslash \{0\})$ is dense in $M_n(\mathbb{C})$ and is a set of full measure. $\square$

In fact, theorem 1 (resp. lemma 3) is a consequence of theorem 2 (resp. lemma 4).

Notice that there are no theorems similar to theorems 1 and 2 when we deal with diagonalization over $\mathbb{R}$. Indeed, the characteristic polynomial of $M= \left( \begin{matrix} 0 & -1 \\ 1 & 0 \end{matrix} \right)$ has a negative discriminant, so there exists a neighborhood $V$ of $M$ such that the discriminant is negative on $V$; in particular, the matrices in $V$ are not diagonalizable.

However, it is possible to determine the closure of the set of diagonalizable matrices in $M_n(\mathbb{R})$:

Property: The closure of the set of diagonalizable matrices (over $\mathbb{R}$) in $M_n(\mathbb{R})$ is the set of triangularizable matrices.

Proof. Let $M \in M_n(\mathbb{R})$ be a triangularizable matrix. So there exist $P \in Gl_n(\mathbb{R})$ and $\lambda_1, \dots, \lambda_n \in \mathbb{R}$ such that

$M=P \left( \begin{matrix} \lambda_1 & & \ast \\ & \ddots & \\ 0 & & \lambda_n \end{matrix} \right) P^{-1}$.

For all $\epsilon>0$, let $0<\epsilon_1,\dots, \epsilon_n<\epsilon$ be such that $\lambda_i+ \epsilon_i \neq \lambda_j + \epsilon_j$ when $i \neq j$. Therefore, the matrix

$M= P \left( \begin{matrix} \lambda_1+ \epsilon_1 & & \ast \\ & \ddots & \\ 0 & & \lambda_n + \epsilon_n \end{matrix} \right) P^{-1}$

is diagonalisable according to lemma 3. Because $M_{\epsilon} \underset{\epsilon \to 0}{\longrightarrow} M$, we deduce that $M$ belongs to the closure of the set of diagonalizable matrices.

Conversely, let $M$ be the limit of diagonalizable matrices $(M_j)$. Let $P_j$ (resp. $P$) be the characteristic polynomial of $M_j$ (resp. $M$). We can write

$\displaystyle P_j = \prod\limits_{k=1}^n (X- \lambda_{j,k} )$ and $\displaystyle P= \prod\limits_{k=1}^n (X-\lambda_k)$,

where $\lambda_{j,k} \in \mathbb{R}$ and $\lambda_k \in \mathbb{C}$.

From now on, we fix some $1 \leq k \leq n$. For all $s \geq 0$, let $1 \leq i_s \leq n$ be such that $|\lambda_k- \lambda_{s,i_s}| = \min\limits_{1 \leq j \leq n} |\lambda_k-\lambda_{s,j}|$. Therefore,

$\displaystyle |P_s(\lambda_k)| = \prod\limits_{j=1}^n |\lambda_k- \lambda_{s,j}| \geq |\lambda_k- \lambda_{s,i_s}|^n$

hence

$\displaystyle |\lambda_k-\lambda_{s,i_s}| \leq \sqrt[n]{|P_s(\lambda_k)|} \underset{s \to + \infty}{\longrightarrow} \sqrt[n]{|P(\lambda_k)|}=0$.

(We used the fact that the coefficients of a characteristic polynomial are polynomials on the coefficient of the matrix to justify that $P_k \to P$.)

Therefore, the complex roots of $P$ are limits of roots of the $P_n$ so $\lambda_1,\dots,\lambda_n \in \mathbb{R}$. We deduce that $M$ is triangularizable. $\square$

Now, we deal with a probabilistic viewpoint; intuitively, the following theorem states that a matrix in $M_n(\mathbb{Z})$ is diagonalizable over $\mathbb{C}$ with a probability of one:

Theorem 3: Let $T_n(k)$ be the number of matrices in $M_n(\mathbb{Z})$ whose coefficients are in $\{ -k , \dots ,k\}$ and $R_n(k)$ be the number of similar matrices whose (complex) eigenvalues are not pairwise distinct. Then

$\displaystyle \lim\limits_{k \to + \infty} \frac{R_n(k)}{T_n(k)} = 0$.

Proof. Let $\varphi : M \mapsto \Delta(P_M)$ where $P_M$ denotes the characteristic polynomial of $M$. We already know that $\varphi$ is a polynomial $g_0(x_1, \dots , x_{n^2})$ on the coefficients of $M$ and that $\varphi(M)=0$ if and only if $M$ has a complex eigenvalue with multiplicity at least two. Therefore, if

$S= \{ (x_1, \dots, x_{n^2}) \in \{-k, \dots,k\}^{n^2} \mid g_0(x_1,\dots, x_{n^2})=0 \}$

then $R_n(k)= |S|$.

By induction, we define $g_i \in \mathbb{Z}[X_{i+1}, \dots, X_{n^2}]$ for $1 \leq i \leq n^2$ as the leading coefficient of $g_{i-1} \in \mathbb{Z} [X_{i+1}, \dots, X_{n^2}] [X_i]$. In particular, notice that $g_{n^2} \in \mathbb{Z} \backslash \{0\}$ and $\deg(g_i) \leq m$. For all $i \geq 1$, let us introduce the set

$\begin{array}{lc} S_i = & \{ (x_1,\dots, x_{n^2}) \in S \mid g_0(x_1,\dots,x_{n^2})= \cdots = g_{i-1}(x_i,\dots,x_{n^2})=0, \\ & g_i(x_{i+1},\dots,x_{n^2}) \neq 0 \} \end{array}$

Then $\displaystyle S= \coprod\limits_{i=1}^{n^2} S_i$ so $\displaystyle |S|= \sum\limits_{i=1}^{n^2} |S_i|$.

If $g_i(x_{i+1}, \dots, x_{n^2}) \neq 0$, the polynomial $g_{i-1}( \cdot,x_{i+1}, \dots,x_{n^2}) \in \mathbb{Z}[X_i]$ is non-zero so it has at most $\deg(g_{i-1}) \leq m$ roots. We deduce that

$\displaystyle |S_i| \leq m(2k+1)^{n^2-1}$.

(We have at least $m$ choices for $x_i$ and $2k+1$ choices for each $x_j$ with $j \neq i$.)

Therefore, $|S| \leq mn^2(2k+1)^{n^2-1}$, and because $T_n(k)= (2k+1)^{n^2}$:

$\displaystyle \frac{R_n(k)}{T_n(k)} \leq \frac{mn^2}{2k+1} \underset{k \to + \infty}{\longrightarrow} 0. \hspace{1cm} \square$

Again, theorem 3 does not hold for diagonalizations over $\mathbb{R}$ or $\mathbb{Q}$. For example,

$\displaystyle \lim\limits_{k \to + \infty} \frac{D_2(k)}{T_2(k)} = \frac{49}{72}$

over $\mathbb{R}$, and even

$\displaystyle \lim\limits_{k \to + \infty} \frac{D_2(k)}{T_2(k)} =0$

over $\mathbb{Q}$, where $D_2(k)$ is the number of diagonalizable (over $\mathbb{R}$ or $\mathbb{Q}$) matrices whose coefficients are in $\{-k, \dots,k \}$.

More information can be found in the paper The probability that a matrix of integers is diagonalizable.