Saturday, September 29, 2007

More human-friendly Java GC timestamps

Update: If you are using the Sun (Oracle) JVM version 6u4 or newer then using the command line option -XX:+PrintGCDateStamps instead of -XX:+PrintGCTimeStamps will output a current date rather than a relative time in the GC logs.

If you've working on a medium to large Java project then one of the aspects of application performance you're probably (should be!) monitoring is garbage collection.

If you're using the Sun JVM or compatible JVM then I recommend you enable verbose garbage collection output with at least the following command-line options:


java -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc ...


This will print something like the following to stdout :


83.304: [GC 83.304: [DefNew: 890K->64K(960K), 0.0044947 secs] 890K->217K(5056K), 0.0046062 secs]
83.390: [GC 83.390: [DefNew: 960K->64K(960K), 0.0043936 secs] 1113K->376K(5056K), 0.0044796 secs]
83.534: [GC 83.534: [DefNew: 951K->64K(960K), 0.0046934 secs] 1263K->467K(5056K), 0.0048648 secs]
83.638: [GC 83.638: [DefNew: 960K->64K(960K), 0.0041580 secs] 1363K->658K(5056K), 0.0042530 secs]
83.783: [GC 83.784: [DefNew: 960K->64K(960K), 0.0030418 secs] 1554K->748K(5056K), 0.0031353 secs]
83.921: [GC 83.921: [DefNew: 960K->55K(960K), 0.0019457 secs] 1644K->794K(5056K), 0.0020575 secs]
91.655: [GC 91.655: [DefNew: 951K->19K(960K), 0.0012843 secs] 1690K->808K(5056K), 0.0013694 secs]
94.198: [Full GC (System) 94.198: [Tenured: 788K->759K(4096K), 0.0428238 secs] 883K->759K(5056K), [Perm : 2095K->2095K(12288K)], 0.0429641 secs]


As an alternative to writing to stdout - e.g. gc logging may get mixed up with a lot of non-gc related output - you can have the JVM log gc to a file with timestamps automatically enabled by using the -Xloggc option:


java -XX:+PrintGCDetails -Xloggc:logs/gc.log ...


One small annoyance I have with this logging is that the timestamp - which is of the format SSSS.MMMM where SSSS it the elapsed seconds and MMMM is the elapsed millisecs - is relative to when the JVM started. I would find it more useful if the timestamp told me the date and time of day that the logging occurred.

So here is a short list of requirements for more "human-friendly" timestamping:

  • Log a timestamp that contains the date and time of day of the logging event, preferably in a configurable format. e.g. see java.text.SimpleDateFormat
  • Work in a JVM-neutral and platform-neutral way
Given the above requirements what are my options?

Diving into the openjdk source code I found that the verbose gc logging is implemented in native C code.

From Java 5 onwards some of the garbage collection information is exposed via the GarbageCollectorMXBean JMX bean, but what is not exposed is any type of notification/callback when the gc actually occurs or more collector-specific information. e.g. for the parallel garbage collector how much time was spent in stop-the-world phase and how much time was spent in other phases.

Given this info I found two solutions:

1. Use a native wrapper around the JVM that timestamps messages logged to stdout

For this option I could use something like the very mature and useful utility Java Service Wrapper that provides this sort of wrapper around stdout (among other useful features), or possibly roll my own native code that redirects stdout (nightmare!).

A disadvantage of this approach is that you will have to deploy native libraries along with your application, which may not be easy to integrate into application servers, clustered environments, etc.

2. Create a thread that occasionally polls the GarbageCollectorMXBeans and logs the number of collections and collection time since the last poll

The code for this approach looks something like this:


package robertmaldon.gc;

import java.lang.management.GarbageCollectorMXBean;
import java.lang.management.ManagementFactory;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.List;

public class GCTimeStamper implements Runnable {

private SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss,SSS");

private GarbageCollectorMXBean minorGCBean;
private long lastMinorCollectionCount;
private long lastMinorCollectionTime;

private GarbageCollectorMXBean fullGCBean;
private long lastFullCollectionCount;
private long lastFullCollectionTime;

public GCTimeStamper() {
}

public void run() {
List<GarbageCollectorMXBean> gcMBeans = ManagementFactory.getGarbageCollectorMXBeans();

for (GarbageCollectorMXBean gcBean : gcMBeans) {
if ("Copy".equals(gcBean.getName())) {
minorGCBean = gcBean;
} else if ("MarkSweepCompact".equals(gcBean.getName())) {
fullGCBean = gcBean;
} else if ("ParNew".equals(gcBean.getName())) {
minorGCBean = gcBean;
} else if ("ConcurrentMarkSweep".equals(gcBean.getName())) {
fullGCBean = gcBean;
} else {
System.err.println("Unable to classify GarbageCollectorMXBean [" + gcBean.getName() + "]");
}
}

try {
while (true) {
if (minorGCBean.getCollectionCount() != lastMinorCollectionCount) {
long diffCount = minorGCBean.getCollectionCount() - lastMinorCollectionCount;
long diffTime = minorGCBean.getCollectionTime() - lastMinorCollectionTime;
System.out.println(getTimeStamp() + "Minor GC x " + diffCount + ", " + diffTime + " millisecs");

lastMinorCollectionCount = minorGCBean.getCollectionCount();
lastMinorCollectionTime = minorGCBean.getCollectionTime();
}

if (fullGCBean.getCollectionCount() != lastFullCollectionCount) {
long diffCount = fullGCBean.getCollectionCount() - lastFullCollectionCount;
long diffTime = fullGCBean.getCollectionTime() - lastFullCollectionTime;
System.out.println(getTimeStamp() + "Full GC x " + diffCount + ", " + diffTime + " millisecs");

lastFullCollectionCount = fullGCBean.getCollectionCount();
lastFullCollectionTime = fullGCBean.getCollectionTime();
}

// Sleep a little bit before the next poll
Thread.sleep(1000);
}
} catch (Exception ex) {
// Do nothing except exit this thread
}
}

private String getTimeStamp() {
Date now = new Date();

long upTime = ManagementFactory.getRuntimeMXBean().getUptime();
long upTimeSecs = upTime / 1000;
long upTimeMillis = upTime % 1000;

return dateFormat.format(now) + " [" + upTimeSecs + "." + upTimeMillis + "] ";
}
}


The above class would be started with a new Thread(new GCTimeStamper()).start() or a TaskExecutor, and you can replace the System.out statements with calls to log4j or whatever.

The output of this class mixed with the gc output looks something like this:


83.304: [GC 83.304: [DefNew: 890K->64K(960K), 0.0044947 secs] 890K->217K(5056K), 0.0046062 secs]
83.390: [GC 83.390: [DefNew: 960K->64K(960K), 0.0043936 secs] 1113K->376K(5056K), 0.0044796 secs]
83.534: [GC 83.534: [DefNew: 951K->64K(960K), 0.0046934 secs] 1263K->467K(5056K), 0.0048648 secs]
83.638: [GC 83.638: [DefNew: 960K->64K(960K), 0.0041580 secs] 1363K->658K(5056K), 0.0042530 secs]
83.783: [GC 83.784: [DefNew: 960K->64K(960K), 0.0030418 secs] 1554K->748K(5056K), 0.0031353 secs]
83.921: [GC 83.921: [DefNew: 960K->55K(960K), 0.0019457 secs] 1644K->794K(5056K), 0.0020575 secs]
2007-09-28 17:03:59,037 [84.80] Minor GC x 6, 22 millisecs
91.655: [GC 91.655: [DefNew: 951K->19K(960K), 0.0012843 secs] 1690K->808K(5056K), 0.0013694 secs]
2007-09-28 17:04:07,036 [92.79] Minor GC x 1, 1 millisecs
94.198: [Full GC (System) 94.198: [Tenured: 788K->759K(4096K), 0.0428238 secs] 883K->759K(5056K), [Perm : 2095K->2095K(12288K)], 0.0429641 secs]
2007-09-28 17:04:10,036 [95.79] Full GC x 1, 42 millisecs


This class will print the approximate date/time a gc occurred as well as the time relative to the start of the JVM, so you can then use that to correlate with the more verbose output in the gc.log file.

Another interesting problem is determining which gc was minor and which was full (major). If you're using the Sun JVM and the default gc settings then a collector named "Copy" does minor collections and a collector named "MarkSweepCompact" does full collections. If you're using the Sun JVM with the -XX:+UseConcMarkSweepGC option then a collector named "ParNew" seems to do minor collections and a collector named "ConcurrentMarkSweep" seems to do full collections. (I say "seems to" because the behavior changes depending on if you're running on a one cpu or multi-cpu machine.)

Not an ideal solution, but if anybody knows a better way I would be interested in hearing from you!

10 comments:

Anonymous said...

Thanks Robert .. I was really confused looking at the GarbageCollectorMXBean output and didnt really understood those 'time' and 'collection' part of data.

Actually your code is exactly what I want.

Probably as you mentioned, a callback from JVM to tell you that a GC has happened now with these collection would have been best, but for now this is good.

Cheers

Anonymous said...

As of 6u4 -XX:+PrintGCDateStamps is available. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6517301.

GNeeraj said...

Hi,

Just wanted to know, how to use this in our application. I mean how this will interact with GC.logs

Thanks Neer

Robert Maldon said...

Hi GNeeraj, the output from the sample code is independent of anything written to the log file specified by the "-Xloggc:..." option.

As noted in the comments if you are running a newish version of Java 6 or Java 7 then you are better off using the built-in -XX:+PrintGCDateStamps option.

destroyedlolo said...

Hello Roberts and thanks for this code.

But, something is missing for my own monitoring : I mean, during CMS, there is a "stop-the-world-phase", but ... which API to use to monitor that ?

Bye

Laurent

Robert Maldon said...

Hi Laurent,

Apologies for the late reply - busy time at work.

If you want to drill down into the CMS collector in more detail then try this code: http://blogs.oracle.com/poonam/entry/how_to_programmatically_obtain_gc

destroyedlolo said...

Thanks Robert for your reply.

Unfortunately, Oracle's code doesn't enter in detail and provides only the full duration of a CMS, and not the corresponding stop-the-world.

I'm a bit frustrated as the only working working examples are :
- people directly modifying C Java's source code to expose that in a custom java beam (but I can't as I'm using a commercial JVM).
- awking on GC.log ... but it's heavy for the system (additional forks) and will not help me to make the link b/w business and GC activities ...

Any other ideas ?

Bye

Laurent

Robert Maldon said...

Hi Laurent,

After putting on my thinking cap for a couple of days I can't come up with any other options beyond the ones you suggested (hacking the C code and parsing the logs from another process). However, can you describe in a little more detail what you are trying to achieve? Are you trying to directly correlate GC activity with some program activity? e.g. verify some computation, which might produce lots of objects, causes a "stop the world" GC pause?

destroyedlolo said...

Hi Roverts,

I'm working a JAVA application (SAG's webMethods) and I already built my own monitoring to track its own businesses (documents processes, threads, memory, ...).

The problem is during pick of activity, we are facing some errors and I'm incline to think it's due to bad response time from the application caused by frequent/long "stop-the-world".

Unfortunately, I didn't find a way to monitor such stoppage so I can't implement stop adequate monitoring ...

Bye

Robert Maldon said...

Hi destroyedlolo, I've come across a newly released open source product that might get you closer to your goals called jHiccup (http://www.azulsystems.com/content/jhiccup-product-redirect). It claims to measure "platform" (non-application) pauses caused by GC, OS swapping and others. You will have to manually correlate the platform pauses with the GC logs.

Alternatively you might like to roll your own monitoring using a similar approach. e.g. create a thread that wakes up every 1 ms; if the thread finds it was actually asleep for more than 1 ms then some sort of pause happened (it may be GC or some other platform event).

Good luck!