Why the Set of Languages over Any Non Empty Alphabet Is Uncountable

Published on 2019-12-27

Warning: this is not very rigorous, I just want to provide an intuitive explanation.

I have seen a lot of different proofs on why the set of languages over an alphabet $\Sigma$ is uncountable, however the one I'm about to show you now is the simplest in my opinion. It involves using Cantor's diagonalisation argument and forming a couple of bijections.

Firstly, we can see that if we prove that the set of languages over the alphabet $\Sigma = \{0\}$ denoted by $\mathcal{L}$ is uncountable, then the set of languages over any non-empty alphabet is uncountable, as we can just take a one element subset of the alphabet, form a bijection to $\{0\}$ , and as the set of languages with that subset is uncountable, and this set is a subset of all languages possible from the alphabet, then the set of languages for the whole alphabet must be uncountable. Thus it is sufficient to prove that the set of languages over the alphabet $\Sigma = \{0\}$ is uncountable.

Ok, first of all, let us observe the set of all possible finite-length strings formed from the alphabet $\Sigma = \{0\}$ . will just be strings of a finite number of 0s concatenated i.e.,

$S = \{0, 00, 000, 0000, ... \}$
We can create an injective function
$S \rightarrow \mathbb{N}$
$s \mapsto |s|$ where |s|

is the length of the string

. Clearly, this is injective, and thus the set

is countable.

As is countable, we can assign a unique natural number to each element of , i.e., we can say that is of the form

$S = \{s_1, s_2, s_3, ...\}$
Let $L \in \mathcal{L}$ . Then, $\forall s_i \in S$ , either $s_i \in L$ or $s_i \notin L$ . Thus, we can represent

as so:
$\begin{tabular}{||c c c c c c||} \hline $S:$ & $s_1$ & $s_2$ & $s_3$ & $s_4$ & ... \\ [0.5ex] \hline\hline $L:$ & 0 & 1 & 1 & 0 & ...\\ \hline \end{tabular}$
where a 0 means that $s_i \notin L$ and a 1 means that $s_i \in L$ for the s_i

in that column.

Now, we want to prove that $\mathcal{L}$ is uncountable. We will do this using a proof by contradiction. Let us assume that $\mathcal{L}$ is countable. This means that there must be some enumeration of elements in $\mathcal{L}$ where we assign a unique natural number to each element of $\mathcal{L}$ . Let us take one such enumeration.
For this enumeration, we can create a similar representation in terms of 0s and 1s for all the $L \in \mathcal{L}$ :

$\begin{tabular}{||c c c c c c||} \hline $S:$ & $s_1$ & $s_2$ & $s_3$ & $s_4$ & ... \\ [0.5ex] \hline\hline $L_1:$ & 0 & 1 & 1 & 0 & ...\\ \hline $L_2:$ & 1 & 0 & 0 & 1 & ...\\ \hline $L_3:$ & 1 & 1 & 0 & 0 & ...\\ \hline $L_4:$ & 0 & 0 & 0 & 1 & ...\\ \hline \end{tabular}$
and so on. If we can prove that this enumeration does not contain every language over $\Sigma$ , then we know that this enumeration is not valid as it is not surjective. Now, let us create a language

, where the column containing s_i

is 0 if it is 1 for L_i

, and 1 if it is 0 for L_i

. That is, we take the complement of every element in the diagonal of the above table:
$\begin{tabular}{||c c c c c c||} \hline $S:$ & $s_1$ & $s_2$ & $s_3$ & $s_4$ & ... \\ [0.5ex] \hline\hline $L_1:$ & {\bfseries 0} & 1 & 1 & 0 & ...\\ \hline $L_2:$ & 1 & {\bfseries 0} & 0 & 1 & ...\\ \hline $L_3:$ & 1 & 1 & {\bfseries 0} & 0 & ...\\ \hline $L_4:$ & 0 & 0 & 0 & {\bfseries 1} & ...\\ \hline \end{tabular}$
So in this case,

includes

but doesn't include s_4

as that is what the complement of every element in the above diagonal represents. We can see that the $i^{th}$ element in

differs with the $i^{th}$ element in L_i

, and thus

is different from every other L_i

. Thus this enumeration does not cover every element in $\mathcal{L}$ . As the above argument can be applied to any enumeration, no such enumeration can exist. Thus we cannot form a bijection to $\mathbb{N}$ , and so $\mathcal{L}$ is uncountable.

References:

Email a comment to the public inbox!

This work by Chinmaya Krishnan Mahesh is licensed under a Creative Commons Attribution 4.0 International License. Source Code.