Skip to content

Creates a compressed regex from a list of strings

Notifications You must be signed in to change notification settings

gleenn/regex_compressor

Repository files navigation

Regex Compressor

Download

Regex Compressor is a Java library intended to take a list of strings and generate a compact, Java-compatible regex that matches those exact strings.

Usage

import static com.gleenn.regex_compressor.RegexCompressor.*;

This example shortens the regex by two characters, and also improves the regex performance because it won't have to check the "aaaa" prefix 4 times when matching "aaaad", only once. This can have a large impact on performance. compress(asList("aaaab", "aaaac", "aaaad", "aaaae")) => "aaaa[bcde]"

compress(asList("foo", "bar", "baz")) => "foo|ba[rz]"

Use the pattern method instead of compress to generate a Java Pattern directly.

Notes

This should have improved regex performance on large numbers of strings. It uses a trie data structure to compress common prefixes.

Works with Unicode!

Insertion order matters, it can affect the result of matches if you care about more than just does the string match true/false.

Future work

  • Use radix trees to improve performance when compressing long prefixes
  • Compress suffixes (although this is pretty non-trivial)
  • Use vistor pattern to not expose internals

Related work

This project is based on a Clojure implementation called Frak.

Radis tree implemented in Java by thegedge.

Licence

Eclipse Public License - v 1.0

About

Creates a compressed regex from a list of strings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •