Fancy Coder

share my projects and discoveries

“Regular Expression” matches all non-prime numbers

I found a “regular expression” which matches all strings consists of non-prime number of 1s.

This is quite surprising because it can be proven that non-prime number of 1s is not a regular language. That is to say, no regular expression can match that language.

However, this regular expression works.
Following python code will print out all prime numbers within [0,99]

import re
filter(lambda n:not re.match(r"^1?$|^(111*)\1+$","1"*n),range(100))

So why it works? Let’s look at the regular expression: ^1?$ matches “” and “1”, which is trivial. In ^(111*)\1+$, (111*) matches 2 or more 1s, so (111*)\1+ matches n*k 1s, k>=2, n>=2, which is exactly all composite numbers.

Then what’s wrong with the theory? Actually, nothing is wrong. The “regular expression” in most programming language is much more expressive than the formal regular expression defined in computational theory. In this case, we do not allow “\1” in formal regular language. Actually, this “\1” may potentially require unbounded amount of memory, because the program need to remember the matched string in “\1”, which can be arbitrarily long. In formal regular language, the memory cost should be bounded when regular expression is given: bounded by the size of corresponding DFA. Thus, “\1” kind of stuff gives regular expression much more power.

Next question is: is “regular expression” in programming language equivalent to Turing Machine?


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: