Saturday, August 24, 2013

Writing a Java Regular Expression Without Reading the ***** Manual


Writing and/or maintaining regular expressions is a part of every developer's routine work. Hey, and we usually can't stand it. It's annoying, the syntax is not humanly memorable, and overall it is an experience that one wants to leave behind him as quickly as possible, so he can move on to the actual problem he is facing. Wonder what will happen if we would Poker Estimate an RE problem. what would be the deviation between the estimation and the real time it took?
You see, when we need to write a new RE we go through the following steps: 
  1. Visit Pattern for a quick recap on the syntax. 
  2. Describe the RE in English. It goes something like: "start with 4 digits followed by spaces afterwards the string "DUR" then again some spaces and finally one digit"
  3. Translate the English description to Java syntax: "\d{4}\s+DUR\s+\d"
  4. Come up with examples. So here it will be something like: "1234 DUR 9" 
  5. Write a test validating the examples, thinking on edge cases, and making sure the RE is valid.
The situation is even worse when one needs to change an existing regular expression. Here we need to translate the RE syntax back to English, apply the changes and translate it back to RE syntax. This is again followed by examples and testing.
We are not alone facing this problem. Several solutions exist to help ease the process (e.g. txt2re). The problems with these solutions are:
  • They always require leaving the IDE.
  • They usually don't help with understanding an existing RE, but rather only help create new ones.
So what do we suggest? We present you with the Regular Expression Wizard, a new approach for writing and maintaining Java Regular Expression. This is a Java based project that aims to help you write RE fluently using the Wizard Design Pattern.
How simple can it get? Let's write the RE from our previous example using the new wizard. Just create a wizard object, and than using static methods slowly build your own RE, followed by examples for testing. 
   1: RE_Wizard re = new RE_Wizard();
   2: String dur = re.start().
   3:         a_character_described_as(a_digit).exactly(4L).then().
   4:         a_character_described_as(a_whitespace_character).once_or_more().then().

   5:         a_fixed_string("DUR").then().
   6:         a_character_described_as(a_whitespace_character).then().
   7:         a_character_described_as(a_digit).
   8:         for_example("1234 DUR 9").for_example("4423   DUR 1").the_end();

Here there is no need for steps A (syntax recap), C (using the syntax) and E (writing a test). Note that if the stated example does not match the regular expression than an ExampleDoesNotMatchRegularExpression exception will be thrown. All you need to do is to describe the RE in English and come up with some examples. The best part comes when later on you need to change it. Again you do not need to deal with weird syntax. You only need to know English.

Let us take another example. Mkyong wrote a post on "10 Java Regular Expression Examples You Should Know". We took the one for creating a regular expression for time in a 24-hour format. 

   1:       //([01]?[0-9]|2[0-3]):[0-5][0-9]
   2:        RE_Wizard re = new RE_Wizard();
   3:        String timeRE = re.start().start_group().
   4:                any_character_in("01").no_more_then(1L).then().
   5:                any_character_in_the_range("0","9").
   6:                or().
   7:                a_fixed_string("2").then().
   8:                any_character_in_the_range("0","3").then().
   9:                close_group().
  10:                then().
  11:                a_fixed_string(":").then().
  12:                any_character_in_the_range("0","5").then().
  13:                any_character_in_the_range("0","9").then().
  14:                for_example("06:58").
  15:                for_example("6:45").
  16:                for_example("23:12").
  17:                the_end();

So where can you get a hold of this? The wizard code can be found on https://github.com/azarian/wizards.Use it, share it, feedback us, and forget about losing time writing RE's. 

Disclaimers
  • We did not implement all java regular expression syntax mostly due to time limitation. If anyone wishes to contribute he will be highly appreciated.
  • We do not include instructions on how to use the builder. We hope it is straight forward. If it is not than we are missing the point, so please inform us.