Your task is to implement a simple regular expression parser. We will have a parser that outputs the following AST of a regular expression:
data RegExp = Normal Char -- ^ A character that is not in "()*|."
| Any -- ^ Any character
| ZeroOrMore RegExp -- ^ Zero or more occurances of the same regexp
| Or RegExp RegExp -- ^ A choice between 2 regexps
| Str [RegExp] -- ^ A sequence of regexps.
RegExp* normal (char); // ^ A character that is not in "()*|."
RegExp* any (); // ^ Any character
RegExp* zeroOrMore (RegExp *starred); // ^ Zero or more occurances of the same regexp
RegExp* or (RegExp* left, RegExp* right); // ^ A choice between 2 regexps
RegExp* str (RegExp *first); // ^ A sequence of regexps, first element
RegExp* add (RegExp *str, RegExp* next); // ^ A sequence of regexps, additional element
Reg.Exp normal (char); // ^ A character that is not in "()*|."
Reg.Exp any (); // ^ Any character
Reg.Exp zeroOrMore (Reg.Exp starred); // ^ Zero or more occurances of the same regexp
Reg.Exp or (Reg.Exp left, Reg.Exp right); // ^ A choice between 2 regexps
Reg.Exp str (Reg.Exp first); // ^ A sequence of regexps, first element
Reg.Exp add (Reg.Exp str, Reg.Exp next); // ^ A sequence of regexps, additional element
RegExp* normal (char); // ^ A character that is not in "()*|."
RegExp* any (); // ^ Any character
RegExp* zeroOrMore (RegExp *starred); // ^ Zero or more occurances of the same regexp
RegExp* orr (RegExp* left, RegExp* right); // ^ A choice between 2 regexps
RegExp* str (RegExp *first); // ^ A sequence of regexps, first element
RegExp* add (RegExp *str, RegExp* next); // ^ A sequence of regexps, additional element
Normal (c) # ^ A character (string) that is not in "()*|."
Any () # ^ Any character
ZeroOrMore (regexp) # ^ Zero or more occurances of the same regexp
Or (regexp, regexp) # ^ A choice between 2 regexps
Str ([regexps]) # ^ A sequence of regexps.
new Void() // An empty RegExp object, used only // as default output for wrong inputs. new Normal(char c) // A character (string) that is not in "()*|." new Any() // Any character new ZeroOrMore(RegExp regexp) // Zero or more occurances of the same regexp new Or(RegExp r1, RegExp r2) // A choice between 2 regexps new Str(List<RegExp> lstRegexps]) // A sequence of regexps.
// All these objects implement the toString(), hashCode() and equals() methods and extend the RegExp abstract class. In addition, you can use the getType() method to obtain a string representing the structure of the object and its children (using Str, Or, Normal, ... dnominations).
new Normal(c) // A character that is not in "()*|."
new Any() // Any character
new ZeroOrMore(regexp) // Zero or more occurances of the same regexp
new Or(regexp,regexp) // A choice between 2 regexps
new Str([regexp]) // A sequence of regexps
As with the usual regular expressions, Any
is denoted by the '.'
character, ZeroOrMore
is denoted by a subsequent '*'
and Or
is denoted by '|'
. Brackets, (
and )
, are allowed to group a sequence of regular expressions into the Str
object.
Or
is not associative, so "a|(a|a)"
and "(a|a)|a"
are both valid regular expressions, whereas "a|a|a"
is not.
Operator precedences from highest to lowest are: *
, sequencing and |
. Therefore the followings hold:
"ab*" -> Str [Normal 'a', ZeroOrMore (Normal 'b')]
"(ab)*" -> ZeroOrMore (Str [Normal 'a', Normal 'b'])
"ab|a" -> Or (Str [Normal 'a',Normal 'b']) (Normal 'a')
"a(b|a)" -> Str [Normal 'a',Or (Normal 'b') (Normal 'a')]
"a|b*" -> Or (Normal 'a') (ZeroOrMore (Normal 'b'))
"(a|b)*" -> ZeroOrMore (Or (Normal 'a') (Normal 'b'))
"ab*" -> add (str (normal ('a')), zeroOrMore (normal ('b')))
"(ab)*" -> zeroOrMore (add (str (normal ('a')), normal ('b')))
"ab|a" -> or (add (str (normal ('a')), normal ('b')), normal ('a'))
"a(b|a)" -> add (str (normal ('a')), or (normal ('b'), normal ('a')))
"a|b*" -> or (normal ('a'), zeroOrMore (normal ('b')))
"(a|b)*" -> zeroOrMore (or (normal ('a'), normal ('b')))
"ab*" -> Reg.add (Reg.str (Reg.normal ('a')), Reg.zeroOrMore (Reg.normal ('b')))
"(ab)*" -> Reg.zeroOrMore (Reg.add (str (Reg.normal ('a')), Reg.normal ('b')))
"ab|a" -> Reg.or (Reg.add (Reg.str (Reg.normal ('a')), Reg.normal ('b')), Reg.normal ('a'))
"a(b|a)" -> Reg.add (Reg.str (Reg.normal ('a')), Reg.or (Reg.normal ('b'), Reg.normal ('a')))
"a|b*" -> Reg.or (Reg.normal ('a'), Reg.zeroOrMore (Reg.normal ('b')))
"(a|b)*" -> Reg.zeroOrMore (Reg.or (Reg.normal ('a'), Reg.normal ('b')))
"ab*" -> add (str (normal ('a')), zeroOrMore (normal ('b')))
"(ab)*" -> zeroOrMore (add (str (normal ('a')), normal ('b')))
"ab|a" -> orr (add (str (normal ('a')), normal ('b')), normal ('a'))
"a(b|a)" -> add (str (normal ('a')), orr (normal ('b'), normal ('a')))
"a|b*" -> orr (normal ('a'), zeroOrMore (normal ('b')))
"(a|b)*" -> zeroOrMore (orr (normal ('a'), normal ('b')))
"ab*" -> Str ([Normal ('a'), ZeroOrMore (Normal ('b'))])
"(ab)*" -> ZeroOrMore (Str ([Normal ('a'), Normal ('b')]))
"ab|a" -> Or (Str ([Normal ('a'), Normal ('b')]), Normal ('a'))
"a(b|a)" -> Str ([Normal ('a'), Or (Normal ('b'), Normal ('a'))])
"a|b*" -> Or (Normal ('a'), ZeroOrMore (Normal ('b')))
"(a|b)*" -> ZeroOrMore (Or (Normal ('a'), Normal ('b')))
"ab*" -> new Str ([new Normal ('a'), new ZeroOrMore (new Normal ('b'))])
"(ab)*" -> new ZeroOrMore (new Str ([new Normal ('a'), new Normal ('b')]))
"ab|a" -> new Or (new Str ([new Normal ('a'), new Normal ('b')]), new Normal ('a'))
"a(b|a)" -> new Str ([new Normal ('a'), new Or (new Normal ('b'), new Normal ('a'))])
"a|b*" -> new Or (new Normal ('a'), new ZeroOrMore (new Normal ('b')))
"(a|b)*" -> new ZeroOrMore (new Or (new Normal ('a'), new Normal ('b')))
"ab*" -> Str ([Normal ('a'), ZeroOrMore (Normal ('b'))])
"(ab)*" -> ZeroOrMore (Str ([Normal ('a'), Normal ('b')]))
"ab|a" -> Or (Str ([Normal ('a'), Normal ('b')]), Normal ('a'))
"a(b|a)" -> Str ([Normal ('a'), Or (Normal ('b'), Normal ('a'))])
"a|b*" -> Or (Normal ('a'), ZeroOrMore (Normal ('b')))
"(a|b)*" -> ZeroOrMore (Or (Normal ('a'), Normal ('b')))
"ab*" -> new Str([ new Normal('a'), new ZeroOrMore(new Normal ('b')) ])
"(ab)*" -> new ZeroOrMore(new Str([ new Normal('a'), new Normal('b') ]))
"ab|a" -> new Or( new Str([ new Normal('a'), new Normal('b') ]), new Normal ('a') )
"a(b|a)" -> new Str([ new Normal('a'), new Or( new Normal('b'), new Normal('a') ) ])
"a|b*" -> new Or( new Normal('a'), new ZeroOrMore(new Normal('b')) )
"(a|b)*" -> new ZeroOrMore(new Or( new Normal('a'), new Normal('b') ))
Some examples:
"a" -> Normal 'a'
"ab" -> Str [ Normal 'a', Normal 'b']
"a.*" -> Str [ Normal 'a', ZeroOrMore Any ]
"(a.*)|(bb)" -> Or (Str [Normal a, ZeroOrMore Any]) (Str [Normal 'b', Normal 'b'])
"a" -> normal ('a')
"ab" -> add (str (normal ('a')), normal ('b'))
"a.*" -> add (str (normal ('a')), zeroOrMore (any ()))
"(a.*)|(bb)" -> or (add (str (normal ('a')), zeroOrMore (any ())), add (str (normal ('b')), normal (b)))
"a" -> Reg.normal ('a')
"ab" -> Reg.add (Reg.str (Reg.normal ('a')), Reg.normal ('b'))
"a.*" -> Reg.add (Reg.str (Reg.normal ('a')), Reg.zeroOrMore (Reg.any ()))
"(a.*)|(bb)" -> Reg.or (Reg.add (Reg.str (Reg.normal ('a')), Reg.zeroOrMore (Reg.any ())), Reg.add (Reg.str (Reg.normal ('b')), Reg.normal (b)))
"a" -> normal ('a')
"ab" -> add (str (normal ('a')), normal ('b'))
"a.*" -> add (str (normal ('a')), zeroOrMore (any ()))
"(a.*)|(bb)" -> orr (add (str (normal ('a')), zeroOrMore (any ())), add (str (normal ('b')), normal (b)))
"a" -> Normal ('a')
"ab" -> Str ([ Normal ('a'), Normal ('b')])
"a.*" -> Str ([ Normal ('a'), ZeroOrMore (Any ()) ])
"(a.*)|(bb)" -> Or (Str ([Normal ('a'), ZeroOrMore (Any ())]), Str ([Normal ('b'), Normal ('b')]))
"a" -> new Normal('a')
"ab" -> new Str([ new Normal('a'), new Normal('b') ])
"a.*" -> new Str([ new Normal('a'), new ZeroOrMore( new Any() ) ])
"(a.*)|(bb)" -> new Or( new Str([ new Normal('a'), new ZeroOrMore( new Any() ) ]),
new Str([ new Normal('b'), new Normal('b') ]) )
"a" -> Normal('a')
"ab" -> Str([ Normal('a'), Normal('b') ])
"a.*" -> Str([ Normal('a'), ZeroOrMore( Any() ) ])
"(a.*)|(bb)" -> Or( Str([ Normal('a'), ZeroOrMore( Any() ) ]),
Str([ Normal('b'), Normal('b') ]) )
"a" -> new Normal('a')
"ab" -> new Str([ new Normal('a'), new Normal('b') ])
"a.*" -> new Str([ new Normal('a'), new ZeroOrMore(new Any()) ])
"(a.*)|(bb)" -> new Or( new Str([ new Normal('a'), new ZeroOrMore(new Any()) ])
, new Str([ new Normal('b'), new Normal('b') ])
)
The followings are invalid regexps and the parser should return Nothing
in Haskell / 0
in C or C++ / null
in JavaScript or C# / ""
in Python / new Void()
in Java/Void()
in Kotlin:
""
, ")("
, "*"
, "a("
, "()"
, "a**"
, etc.
Feel free to use any parser combinator libraries available on codewars, or implement the parser "from scratch".