Java String.split() vs. Pattern.split()

If you need to split a string into an array of parts based on a split character, Java provides a very simple to use convinience method. You just need to call split() on the String object you would like to split.

For example:

String myString = "Hy my name is bob";
String[] words = myString.split(" ");

If you need to perform the same kind of split frequently, the Java Regular Expressions API provides a good oportunity for a performance improvement.
The API provides a Pattern class to precompile a split pattern and execute it on a string.

The example above would look like:

Pattern p = Pattern.compile(" ");
String myString = "Hy my name is bob";
String[] words = p.split(myString);

Now, why would this lead to performance improvement?

Internally, String.split() is implemented as

public String[] split(String regex, int limit) {
    return Pattern.compile(regex).split(this, limit);
}

As you can see, String.split() instantiates a new Pattern object on the fly. The garbage collector can remove it quickly, but if you perform the same split again and again, this requires a reasonable overhead.
So what you can do is:

  1. compile your Pattern in advance.
  2. Store it either with an object existing long enough to provide access to the Pattern instance, or in a static manner
  3. use the split of the precompiled pattern instead of the convinience String.split()

Further recommendation

Google Guava Splitter
The Google Guava commons library provides a nice Strings API including a Splitter class you should take a look at
Google Guava Splitter

Hidden evils of Java’s String.split()…
Prashant Deva wrote an interesting blog entry about the split() performance and provides some statistics:
Read the blog entry

Leave a Comment