Resolving Java Performance Problems in Production

Posted: February 20th, 2016 | Author: | Filed under: General

Performance problems in production can be tricky. This post describes a proven sequence for resolving urgent production performance problems for Java.

Sources of Performance Problems

The main sources of the performance problems in production are:

  • coding errors
  • internal and external bottlenecks
  • garbage collection (GC)

Coding errors are places in the code where the application acts in a way that leads to significant reduction of speed such as heavy repeated calculations inside a loop. Internal bottlenecks are improper, excessive synchronization, exhausted thread pools and connection pools. External bottlenecks are serially accessed data sources such as databases, file and socket I/O.

Manifestation of Performance Problems

Performance problems in production usually manifest themselves as users complaining about the application being ‘slow’. While ‘slow’ may be a matter of perception, generally it means that the application is abnormally slow. Performance problems may seriously damage your company business and must be resolved as soon as possible. The problem resolution sequence below worked well for me for many years.

Sequence for Resolving Performance Problems

1. Make sure JVM has correct heap settings. The maximum and minimum head should be set to the same value. Example: -Xmx1000k -Xms1000k.

2. Check memory consumption on the server. Make sure that the system is not swapping. Run top or ps to check if the server as a whole and the application itself is not swapping. Increase memory on the server if it doesn’t have enough memory or move some applications out.

3. Check server CPU usage. If the server is underloaded, this is usually an indication of an internal or external bottleneck. Thread dumps, heap dumps and profiling must help to identify the bottlenecks.

4. Take and analyze thread dumps. This method is great because it doesn’t require any instrumentation, takes a minute, and often is enough to isolate the problem. During the periods of slowness, take 5-6 thread dumps. Just send kill -3 if your application is running under *nix. Separate each thread dump by a 10 second interval. Check the thread dumps for places where threads remain within the same execution stack for extended periods of time. Most of the times the problem stack traces will contain application classes. Watch for thread pools all stuck in file or network I/O. Watch for exhausted thread pools. Watch for execution threads blocked on the same objects. Fix found problems.

5. Take and analyze heap dumps. Take 2-3 heap dumps during normal operation using JMap and during the system slowing down or JVM being completely unresponsive. Analyze the heap dumps using JProfiler for objects having large number of instances. Also, compare a normal heap dump with the one taken during the slowdown. Find objects counts that grew significantly since the period of normal operation. Such objects strongly indicate a memory leak. Find and fix the problem in the code.

6. Profile while running a load test. In the QA environment, run a load test the simulates the same sequence of user actions that production users are having a problem with. Use a profiler such as JProfiler to identify CPU and allocation hot spots. Fix found problems.

7. Profile in production. Profiling adds some overhead, so the application will run slower than normally. Use a single server to profile if running a cluster. To minimize overhead, configure JProfiler properly:

  • Use sampling instead of instrumentation
  • Add a narrow list of packages that are interesting from a user-perspective
  • Do not record allocations
  • Only record CPU when there are reports from users about performance issues

8. Tune GC. Enable GC in production by using command line option –XX:+PrintGCDetails. Watch for major GCs occurring abnormally often. The main causes of frequent major GCs are the application running out of heap due to memory leaks and excessive object creation and incorrect GC parameters. Profile the application for memory leaks and excessive object creation. Fix found problems. Adjust GC parameters for better performance. Tuning GC should be the last thing to do. Check our collection of useful resources on tuning Java GC.

9. Cache hard to get data. If the performance bottleneck is found and it is a database or other serially accessed data source or a hard-to-compute data source, cache that data. Here are a few articles on how to approach caching. Or drop a message in Cacheonix support forum; we’ll try to help.

Hope this helps

Slava Imeshev

(No) Comments: Post a response