If you need to split a string into an array of parts based on a split character, Java provides a very simple to use convinience method. You just need to call split() on the String object you would like to split.
For example:
String myString = "Hy my name is bob"; String[] words = myString.split(" ");
If you need to perform the same kind of split frequently, the Java Regular Expressions API provides a good oportunity for a performance improvement.
The API provides a Pattern class to precompile a split pattern and execute it on a string.
The example above would look like:
Pattern p = Pattern.compile(" "); String myString = "Hy my name is bob"; String[] words = p.split(myString);
Now, why would this lead to performance improvement?
Internally, String.split() is implemented as
public String[] split(String regex, int limit) { return Pattern.compile(regex).split(this, limit); }
As you can see, String.split() instantiates a new Pattern object on the fly. The garbage collector can remove it quickly, but if you perform the same split again and again, this requires a reasonable overhead.
So what you can do is:
- compile your Pattern in advance.
- Store it either with an object existing long enough to provide access to the Pattern instance, or in a static manner
- use the split of the precompiled pattern instead of the convinience String.split()
Further recommendation
Google Guava Splitter
The Google Guava commons library provides a nice Strings API including a Splitter class you should take a look at
Google Guava Splitter
Hidden evils of Java’s String.split()…
Prashant Deva wrote an interesting blog entry about the split() performance and provides some statistics:
Read the blog entry