Published on 2019-12-27

*Warning: this is not very rigorous, I just want to provide an intuitive explanation.*

I have seen a lot of different proofs on why the set of languages over an alphabet is uncountable, however the one I'm about to show you now is the simplest in my opinion. It involves using Cantor's diagonalisation argument and forming a couple of bijections.

Firstly, we can see that if we prove that the set of languages over the alphabet denoted by is uncountable, then the set of languages over any non-empty alphabet is uncountable, as we can just take a one element subset of the alphabet, form a bijection to , and as the set of languages with that subset is uncountable, and this set is a subset of all languages possible from the alphabet, then the set of languages for the whole alphabet must be uncountable. Thus it is sufficient to prove that the set of languages over the alphabet is uncountable.

Ok, first of all, let us observe the set of all possible finite-length strings formed from the alphabet . will just be strings of a finite number of 0s concatenated i.e.,

We can create an injective function

As is countable, we can assign a unique natural number to each element of , i.e., we can say that is of the form

Let . Then, , either or . Thus, we can represent as so:

where a 0 means that and a 1 means that for the in that column.

Now, we want to prove that is uncountable. We will do this using a proof by contradiction. Let us assume that is countable. This means that there must be some enumeration of elements in where we assign a unique natural number to each element of . Let us take one such enumeration.

For this enumeration, we can create a similar representation in terms of 0s and 1s for all the :

and so on. If we can prove that this enumeration does not contain every language over , then we know that this enumeration is not valid as it is not surjective. Now, let us create a language , where the column containing is 0 if it is 1 for , and 1 if it is 0 for . That is, we take the complement of every element in the diagonal of the above table:

So in this case, includes but doesn't include as that is what the complement of every element in the above diagonal represents. We can see that the element in differs with the element in , and thus is different from every other . Thus this enumeration does not cover every element in . As the above argument can be applied to any enumeration, no such enumeration can exist. Thus we cannot form a bijection to , and so is uncountable.

- Diego Assencio's proof for why the set of languages over {0,1} is uncountable
- Jeff Erickson's regular languages notes

Email a comment to the public inbox!

This work by Chinmaya Krishnan Mahesh is licensed under a Creative Commons Attribution 4.0 International License. Source Code.