The Java Pro Seeker: Tips to tune your Java Code for Performance Optimization

Using double/long vs BigDecimal for monetary calculations: double, long, java.math.BigDecimal, java.lang.String :

If you want to implement fast and correct monetary arithmetic operations in Java, stick to the following rules:

Store monetary values in the smallest currency units (for example, cents) in long variables.
Avoid working with non-integral values while using double (calculate in the smallest currency units).
Add/subtract using long.
Round any multiplication/division results using Math.round/rint/ceil/floor (per your system requirements).
Your calculations should fit into 52 bits (double precision).

Always use MathContext for BigDecimal multiplication and division in order to avoid ArithmeticException for infinitely long decimal results. Don't use MathContext.UNLIMITED for that reason - it is equivalent to no context at all.
Do not convert double to BigDecimal, instead convert String to BigDecimal when possible.

Changes to String internal representation made in Java 1.7.0_06: java.lang.String, java.util.HashMap, java.util.Hashtable, java.util.HashSet, java.util.LinkedHashMap, java.util.LinkedHashSet, java.util.WeakHashMap and java.util.concurrent.ConcurrentHashMap :

From Java 1.7.0_06 String.substring always creates a new underlying char[] value for every String it creates. This means that this method now has a linear complexity compared to previous constant complexity. The advantage of this change is a slightly smaller memory footprint of a String (8 bytes less than before) and a guarantee to avoid memory leaks caused by String.substring (see String packing part 1: converting characters to bytes for more details on Java object memory layout).
Java 7u6+ functionality. Removed in Java 8. Starting from the same Java update, String class got a second hashing method called hash32. This method is currently not public and could be accessed without reflection only via sun.misc.Hashing.stringHash32(String) call. This method is used by 7 JDK hash-based collections if their size will exceed jdk.map.althashing.threshold system property. This is an experimental function and currently I don't recommend using it in your code.
Java 7u6 (inclusive) to Java 7u40 (exclusive) functionality. Not applicable to Java 8. All standard JDK non-concurrent maps and sets in all Java versions between Java 7u6 (inclusive) and Java 7u40 (exclusive) are affected by a performance bug caused by new hashing implementation. This bug affects only multithreaded applications creating heaps of maps per second. See this article for more details. This problem was fixed in Java 7u40.

Performance of various methods of binary serialization in Java: java.nio.ByteBuffer, sun.misc.Unsafe, java.io.DataInputStream, java.io.DataOutputStream, java.io.ByteArrayInputStream, java.io.ByteArrayOutputStream: comparison of binary serialization performance using various classes:

It is extremely slow to write single bytes to direct byte buffers. You should avoid using direct byte buffers for writing records with mostly single byte fields.
If you have primitive array fields - always use bulk methods to process them. ByteBuffer bulk methods performance is close to those of Unsafe (though ByteBuffer methods are always a little slower). If you need to store/load any other primitive array except byte - use ByteBuffer.to[YourType]Buffer.put(array) method call followed by your byte buffer position update. Do not call ByteBuffer.put[YourType] method in a loop!
The higher your average field length - the slower is a heap buffer and the faster is a direct byte buffer. Unsafe access even to separate fields is still faster.
In Java 7 many types of ByteBuffer accesses were seriously optimized compared to Java 6.
Always try to serialize primitive arrays using direct byte buffer with your platform native byte order - its performance is very close to Unsafe performance and it is portable, unlike Unsafe code.

Java collections overview: all JDK 1.6/1.7 standard collections are described and categorized in this overview :

Try to follow these rules while using ArrayList:

Add elements to the end of the list
Remove elements from the end too
Avoid contains, indexOf and remove(Object) methods
Even more avoid removeAll and retainAll methods
Use subList(int, int).clear() idiom to quickly clean a part of the list

java.util.LinkedList performance: java.util.LinkedList, java.util.ArrayDeque:

Consider using ArrayDeque for queue-based algorithms
Use ListIterator with LinkedList
Avoid any LinkedList methods which accept or return index of an element in the list - they have nothing in common with performance
Check if you have a reason to use LinkedList.remove/removeFirst/removeLast methods, use pollFirst/pollLast instead
Try batch processing LinkedList

Bit sets: java.util.BitSet, java.util.Set<Integer>: representing set of integers in the most compact form, using bit sets to store set of Long/long values:

Do not forget about bit sets when you need to map a large number of integer keys to boolean flags.
Sets of integer values should be replaced with bit sets in a lot of cases in order to save a lot of memory.

java.util.IdentityHashMap: discussion why an IdentityHashMap is so special and what alternatives does it have.

java.util.IdentityHashMap uses System.identityHashCode to get object identity hash code. Avoid using IdentityHashMap if you either have primary key field in the objects (use them as a key for ordinary HashMap) or use Trove maps custom hashing strategy if you need to add your own equals and hashCode methods, but can't update the objects you are working on.
Do not try to iterate IdentityHashMap contents, because iteration order will be different on every run of your program, thus making your program results inconsistent.
Accessing the object identity hash code is a very cheap Java intrinsic operation.
Beware that an object with the calculated identity hash code can not be used for biased locking. While very rare in normal circumstances, you may end up in this situation if your lock will be accessed by any Java object graph traversal algorithm (serialization, for example).

Regexp-related methods of String: java.util.regex.Pattern, java.util.regex.Matcher, java.lang.String: pattern/matcher logic:

Always (or nearly always) replace String.matches, split, replaceAll, replaceFirst methods with Matcher and Pattern methods - it will save you from unnecessary pattern compilation.
In Java 7 splitting by a single not regex-special character string is optimized in String.split method. Always use String.split to split such strings in Java 7.
In all other simple cases consider handwriting parsing methods for simple situations in the time-critical code. You can easily gain 10 times speedup by replacing Pattern methods with handcrafted methods.

java.util.Date, java.util.Calendar and java.text.SimpleDateFormat performance: java.util.Date, java.util.Calendar, java.text.SimpleDateFormat: date storage, parsing and converting back to string:

Do not use java.util.Date unless you have to use it. Use an ordinary long instead.
java.util.Calendar is useful for all sorts of date calculations and i18n, but avoid either storing a lot of such objects or extensively creating them - they consume a lot of memory and expensive to create.
java.text.SimpleDateFormat is useful for general case datetime parsing, but it is better to avoid it if you have to parse a lot of dates in the same format (especially dates without time). Implement a parser manually instead.
Joda Time library performance: org.joda.time.DateTime, org.joda.time.format.DateTimeFormat, org.joda.time.format.DateTimeFormatter.
This is a comparison of Joda Time library classes performance with standard JDK classes performance (java.util.Date, java.util.Calendar, java.text.SimpleDateFormat).

All Joda Time date/time objects are built on top of a long timestamp, so it is cheap to construct those objects from a long.
Joda Time ver 2.1-2.3 is affected by a performance issue in a timezone offset calculation logic - all years after the last daylight savings rule change in the given timezone use a slow calculation path (European timezones are affected particularly badly). In essence it means that all zones will perform badly in all years after Joda Time release you are using.
Date/time objects construction and date/time arithmetics in Joda work 1.5-3 times faster than GregorianCalendar for the years not affected by an above mentioned performance issue. For affected years date operations performance in Joda plummets and could be 4 times slower than in GregorianCalendar.
Joda does not keep the human time - year/month/day/hour/min/second inside its objects (unlike GregorianCalendar). It means that accessing human time on Joda objects is more expensive if you need to get more than one field.
Date/time parsing in Joda is working a little faster than in JDK SimpleDateFormat. The advantage of Joda parsing is that constructing a parser - DateTimeFormatter object is extremely cheap, unlike an expensive SimpleDateFormat, so you don't have to cache parsers anymore.

JSR 310 - Java 8 Date/Time library performance (as well as Joda Time 2.3 and j.u.Calendar): an overview of a new Java 8 date/time implementation also known as JSR-310 and its performance comparison with Joda Time 2.3 and j.u.GregorianCalendar.

Java 8 date/time classes are built on top of human time - year/month/day/hour/minute/second/nanos. It makes them fast for human datetime arithmetics/conversion. Nevertheless, if you are processing computer time (a.k.a. millis since epoch), especially computer time in a short date range (a few days), a manual implementation based on int/long values would be much faster.
Date/time component getters like getDayOfMonth have O(1) complexity in Java 8 implementation. Joda getters require the computer-to-human time calcualtion on every getter call, which makes Joda a bottleneck in such scenarios.
Parsing of OffsetDateTime/OffsetTime/ZonedDateTime is very slow in Java 8 ea b121 due to exceptions thrown and caught internally in the JDK.

java.io.ByteArrayOutputStream: java.io.ByteArrayOutputStream, java.nio.ByteBuffer: why you should not use ByteArrayOutputStream in the performance critical code.

For performance critical code try to use ByteBuffer instead of ByteArrayOutputStream. If you still want to use ByteArrayOutputStream - get rid of its synchronization.
If you are working on a method which writes some sort of message to unknown OutputStream, always write your message to the ByteArrayOutputStream first and use its writeTo(OutputStream) method after that. In some rare cases when you are building a String from its byte representation, do not forget about ByteArrayOutputStream.toString methods.
In most cases avoid ByteArrayOutputStream.toByteArray method - it creates a copy of internal byte array. Garbage collecting these copies may take a noticeable time if your application is using a few gigabytes of memory (see Inefficient byte[] to String constructor article for another example).

java.io.BufferedInputStream and java.util.zip.GZIPInputStream: java.io.BufferedInputStream, java.util.zip.GZIPInputStream, java.nio.channels.FileChannel: some minor performance pitfalls in these two streams.

Both BufferedInputStream and GZIPInputStream have internal buffers. Default size for the former one is 8192 bytes and for the latter one is 512 bytes. Generally it worth increasing any of these sizes to at least 65536.
Do not use a BufferedInputStream as an input for a GZIPInputStream, instead explicitly set GZIPInputStream buffer size in the constructor. Though, keeping a BufferedInputStream is still safe.
If you have a new BufferedInputStream( new FileInputStream( file ) ) object and you call its available method rather often (for example, once or twice per each input message), consider overriding BufferedInputStream.available method. It will greatly speed up file reading.

java.lang.Byte, Short, Integer, Long, Character (boxing and unboxing): java.lang.Byte, java.lang.Short, java.lang.Integer, java.lang.Long, java.lang.Character:

Never call java.lang.Number subclasses valueOf(String) methods. If you need a primitive value - call parse[Type]. If you want an instance of a wrapper class, still call parse[Type] method and rely on the JVM-implemented boxing. It will support caching of most frequently used values. Never call wrapper classes constructors - they always return a new Object, thus bypassing the caching support. Here is the summary of caching support for primitive replacement classes:
Byte, Short, Long Character Integer Float, Double
From -128 to 127 From 0 to 127 From -128 to java.lang.Integer.IntegerCache.high or 127, whichever is bigger No caching

Map.containsKey/Set.contains: java.util.Map, java.util.Set and most of their implementations:

For sets, contains+add/remove call pairs should be replaced with single add/remove calls even if some extra logic was guarded by contains call.
For maps, contains+get pair shall always be replaced with get followed by null-check of get result. contains+remove pair should be replaced with a single remove call and check of its result.
Same ideas are applicable to Trove maps and sets too.

java.util.zip.CRC32 and java.util.zip.Adler32 performance: java.util.zip.CRC32, java.util.zip.Adler32 and java.util.zip.Checksum:

If you can choose which checksum implementation you can use - try Adler32 first. If its quality is sufficient for you, use it instead of CRC32. In any case, use Checksum interface in order to access Adler32/CRC32 logic.
Try to update checksum by at least 500 byte blocks. Shorter blocks will require a noticeable time to be spent in JNI calls.

hashCode method performance tuning: java.lang.String, java.util.HashMap, java.util.HashSet, java.util.Arrays:

Try to improve distribution of results of your hashCode method. This is far more important than to optimize that method speed. Never write a hashCode method which returns a constant.
String.hashCode results distribution is nearly perfect, so you can sometimes substitute Strings with their hash codes. If you are working with sets of strings, try to end up with BitSets, as described in this article. Performance of your code will greatly improve.

Throwing an exception in Java is very slow: why it is too expensive to throw exceptions in Java: java.lang.Throwable, java.lang.Exception, java.lang.RuntimeException, sun.misc.BASE64Decoder, java.lang.NumberFormatException:

Never use exceptions as return code replacement or for any likely to happen events. Throwing an exception is too expensive - you may experience 100 times slowdown for simple methods.
Avoid using any Number subclass parse*/valueOf methods if you call them for each piece of your data and you expect a lot of non-numerical data. Parse such values manually for top perform

Java logging performance pitfalls: how to lose as little as possible performance while writing log messages: java.util.logging.Logger, java.util.logging.Handler, java.util.logging.Formatter, java.text.MessageFormat:

If you make expensive calculations while preparing data for log messages, either use Logger.isLoggable and do all data preparation inside or write an object which does all calculations in its toString method.
Never call Object.toString method in order to obtain a log message argument - just pass an original object. Logging framework will call toString method on your object.
Do not mix format string concatenation with log arguments - malicious concatenated string will allow your application user to break your logging/access data which was not supposed for user access.

Base64 encoding and decoding performance: an overview of several well-known Base64 Java implementations from the performance perspective: sun.misc.BASE64Encoder, sun.misc.BASE64Decoder, java.util.Base64 (Java 8 specific), javax.xml.bind.DatatypeConverter (Java 6+), org.apache.commons.codec.binary.Base64, com.google.common.io.BaseEncoding (Google Guava), http://iharder.net/base64, MiGBase64:

If you looking for a fast and reliable Base64 codec - do not look outside JDK. There is a new codec in Java 8: java.util.Base64 and there is also one hidden from many eyes (from Java 6): javax.xml.bind.DatatypeConverter. Both are fast, reliable and do not suffer from integer overflows.
2 out of 4 3rd party codecs described here are very fast: MiGBase64 and IHarder. Unfortunately, if you will need to process hundreds of megabytes at a time, only Google Guava will allow you to decode 2G of data at a time (360MB in case of MiGBase64 / 720M in case of IHarder and Apache Commons). Unfortunately, Guava does not support byte[] -> byte[] encoding.
Do not try to call String.getBytes(Charset) on huge strings if your charset is a multibyte one - you may get the whole gamma of integer overflow related exceptions.
A possible memory leak in the manual MultiMap implementation: an overview of multimap implementations in Java 8, Google Guava and Scala 2.10 as well as a description of a possible memory leak you can have while manually implementing a multimap using Java 6 or 7.
As you have seen, it is quite easy to miss a memory leak while implementing a multilevel map. You need to be careful and split read and write accesses to the outer map.
Newer frameworks and languages, like Google Guava, Java 8 and Scala already provide you more convenient syntax and wider choice of collections thus allowing you to avoid possible memory leaks in the multilevel maps.

java.util.Random and java.util.concurrent.ThreadLocalRandom in multithreaded environments: an overview of java.util.Random and java.util.concurrent.ThreadLocalRandom in single and multithreaded environments as well as some low level analysis of their performance.

Do not share an instance of java.util.Random between several threads in any circumstances, wrap it in ThreadLocal instead.
From Java 7 prefer java.util.concurrent.ThreadLocalRandom to java.util.Random in all circumstances - it is backwards compatible with existing code, but uses cheaper operations internally.

Charset encoding and decoding in Java 7/8: we will check how fast are Charset encoders/decoders in Java 7 and what are the performance improvements in Java 8.

Always prefer national charsets like windows-1252 or Shift_JIS to UTF-8: they produce more compact binary representation (as a rule) and they are faster to encode/decode (there are some exceptions in Java 7, but it becoming a rule in Java 8).
ISO-8859-1 always works faster than US-ASCII in Java 7 and 8. Choose ISO-8859-1 if you don't have any solid reasons to use US-ASCII.
You can write a very fast String->byte[] conversion for US-ASCII/ISO-8859-1, but you can not beat Java decoders - they have direct access to the output String they create.

The Java Pro Seeker

Sunday, May 4, 2014

Tips to tune your Java Code for Performance Optimization

No comments:

Post a Comment

Popular Posts