Friday, August 18, 2006

Finding Memory Leaks in Java : Using HeapSummary

One of the Java questions I like to ask at interviews is: Can you have a memory leak in Java? (or .NET, or any other language with built-in garbage collection). Surprisingly about 70% of people say no... What this can tell me about a candidate (I think) is that they have not worked on a large-scale, heavily used application.

For the record here are my thoughts on causes of leaks and how you track them down.

Symptoms of Leaks

In Java the main symptom is "OutOfMemoryError" exceptions in your log files, followed closely by the application dying.

Causes of Leaks

The most common causes of various memory probelms in Java are:
  1. Applications code that holds on to object references. e.g. the application keeps adding objects to global/singleton Maps or Lists and never removing these objects.
  2. An Application-level cache is misconfigured or does not behave as the developers expected.
  3. (More rare) An application server or third-party library has a leak for the same reasons as above.
  4. (Very rare) The C implementation of the JVM has a classic memory leak or suffers from fragmentation.

When a Leak is not a Leak

Sometimes a leak is not really a leak. Sometimes you just need to allocate more memory to your application than you initially guessed/estimated, particulalry if you have more users than expected. So how do you determine the write memory size and garbage collector settings for your application?

Application Load Testing

The best way to both size an application (hopefully before you release to production) and to track down any memory problems is to put the application under a realistic load. The key word in that sentence is realisitc, meaning the load should be approximately the same load pattern you expect to see in production. It's always frustrating to explain to people that you should probably expect an application to fall flat on its face if your first load test is to throw 1 million simultaneous requests at it. Instead you need to measure the application scalability, in throughput per second, at loads approximate to (both above and below, but around) the production load.

What tools can you use to generate load? The most common tools I've seen used are JMeter (my tools of choice, very easy to get up and running), The Grinder and LoadRunner. Those tools simulate client requests, but because they are not web browsers they can't interpret javascript, so may not be suitable to load test applications that are heavily DHTML based. For DHTML types of applications tools that drive a browser like Selenium and WinRunner are more suitable - but you will need a lot of client machines to generate a decent load.

Profiling

To help detect a leak you will need the ability to profile your application.

The most basic information you need from a profiler is how much memory is being used by all of the classes in the JVM. If one class in particular, say a domain object called StockOption, keeps using up more and more memory as your load test continues (if there are more instances of the object that keep getting added to a Map then of course that class will use up more memory) then that is a good leak candidate to investigate.

A really useful profiler should have the ability to show you how instances of objects are related to each other ("heap walking") so you can trace where objects are allocated and where they are referenced. Only the commercial profilers like JProfiler and OptimizeIt have these features. I wish a decent open source profiler was available :(

The most basic memory leak can be found by:

  • Start the application
  • Record memory usage of classes (number of instances and amount of memory used)
  • Apply some load, then stop
  • Run a full GC (or two GCs, just to be sure)
  • Compare the class memory usage to before. The cause of the leak should be at or near the top of the list of most amount of memory being used.

One thing to note is that profilers tend to slow down the application by at least 10 times, so learn to be patient :)

Application Tuning

This is a really big topic. If I get the energy I'll tackle it in another blog entry.

Gathering Memory Statistics from Production : HeapSummary

One thing I like to recommend when dealing with production memory leaks is to get some basic heap information from the production systems themselves. Ideally you want to be able to reproduce memory leaks in a non-production system such as Staging or QA, but sometimes you just don't know which area of the code to start looking at or don't have enough hardware to mirror production problems. Doing anything with a production system makes some people nervous, but I do recommend it when you are short on time or not having luck reproducing the leak in a non-production environment.

So what are your options for gathering some stats from a production system? You could install a profiler, but that slows the application down 10 to 20 times, which is a no-no. You could use "monitoring tools" like Glassbox Inspector or Introscope that profile only selected classes - and therefore adds minimal overhead - but you could easily miss profiling the class that is the cause of the leak.

My preferred solution is to do a JVM heap dump shortly before the application dies with an OutOfMemoryError. The IBM JVM has a number of "dump heap on exit" features and tools to analyze those dumps - see the Diagnostic Guide for your version of the JVM. The Sun Java 5 JVM has a dump on exit feature, which has only very recenlty been backported to the most recent 1.4.2 patch (-XX:+HeapDumpOnOutOfMemoryError command line option added to the Sun 1.4.2_12 JVM), and very few tools to analyze those dumps (jhat doesn't work on dumps from all versions of the JVM).

One lightweight tool I developed to just dump summary information - total number of instances and total memory used for each class - I call HeapSummary. This is a small C module (see source code at end of this blog) that hooks into the JVM via the JVMPI interface. When you send a SIGQUIT/CTRL-Break signal to the JVM process it will dump a text file summary of the heap usage whose output looks something like this:




26664400 562392 TOTAL
9361944 135484 [char
2201272 24938 [java.util.HashMap$Entry
1750704 72946 java.util.HashMap$Entry
1230624 51276 java.lang.String
733480 18337 java.util.HashMap
534328 11448 [java.lang.Object
296928 6186 org.apache.commons.collections.FastHashMap




HeapSummary lays dormant until it receives the SIGQUIT signal, so it adds zero overhead during normal application use. Once it has been activated two full GCs are forced, the heap is "walked" to calculate instance and memory summaries (which can take four or five minutes for heaps of 2GB or greater), the summary file is written and the application goes back to processing as normal.

See the comments at the beginning of heapsummary.c for instructions on compiling and installing.

NOTE: I've only tested HeapSummary on Java 1.4 and 1.3 JVMs, not on Java 5 JVMs yet.

heapsummary.c (source code)




/*

Copyright (c) 2006, Robert Maldon

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


Based on the original code of heapprofile (http://www.virtualmachine.de/)
by Matthias Ernst


INSTALLING AND RUNNING HEAPSUMMARY

Compiling Win32 /Cygwin:
gcc -mno-cygwin -I$JAVA_HOME/include -I$JAVA_HOME/include/win32 -Wl,--add-stdcall-alias -shared -o heapsummary.dll heapsummary.c
Compiling Win32/Visual C++ (thanks to Jeppe Cramon ):
Create a Visual C++ Dynamic DLL project
Comment out #include
Add #define WIN32_LEAN_AND_MEAN and #include
Change sleep(5); to Sleep(5);
Compiling Win32/VC++ Toolkit (incomplete):
Execute %VC_HOME%\VCVARS32 to set up the command line
compilation environment, then compile with
cl -LDd -Zi -I. -I%JDK_HOME%\include -I%JDK_HOME%\include\win32 -Tcheapsummary.c -o heapsummary.dll

Compiling Solaris:
gcc -c -G -O -DUNIX -DSUN -ansi -fPIC -I$JAVA_HOME/include -I$JAVA_HOME/include/solaris heapsummary.c -o heapsummary.o
gcc -shared -o libheapsummary.so heapsummary.o
Compiling Linux:
gcc -o libheapsummary.so -shared -Wl,-soname,libheapsummary.so -I$JAVA_HOME/include -I$JAVA_HOME/include/linux heapsummary.c -static -lc
Compiling OS/X:
cc -bundle -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -I/System/Library/Frameworks/JavaVM.framework/Headers -o libheapsummary.jnilib -framework JavaVM heapsummary.c
Compiling HP/UX 11 PA/RISC:
cc -b -I$JAVA_HOME/include -I$JAVA_HOME/include/hp-ux -o libheapsummary.sl heapsummary.c
Compiling HP/UX 11 Itanium 2 (for 32 bit) like above, only name the library .so instead of .sl


Running:
1) Make sure that the library is in your JVM path either by using
LD_LIBRARY_PATH under Solaris/Linux or PATH under Win32.
2) Load the JVM with the parameter -Xrunheapsummary (see optional
Usage below).

Usage:

-Xrunheapsummary[:[dir=SUMMARY_DIR][,][sort=sizecount]]

where

SUMMARY_DIR is the directory to write summary files to.
SUMMARY_DIR may be an absolute path (e.g. /tmp/summarylogs)
or relative to the JVM start directory (e.g. logs/summarylogs).
If the "dir" option is not specified then summary files will
be written to the JVM start directory.

The "sort" option can have a value of "size" (sorted by total
memory size of a class) or "count" (sorted by number of
instances of a class). If the sort option is not specified
then sorting will default to size.

The name of the summary file is of the format
"heapsummary.YYYYMMDDhhmmss" where YYYY is the year, MM is the
month, DD is the day, hh is the hour, mm is the minute and ss is
the second the summary file is created.

Generating a Heap Summary:

kill -QUIT
or CTRL-\ or CTRL-Break


Format of the Heap Summary File:

MEMORY_SIZE MEMORY_COUNT CLASS_NAME

where MEMORY_SIZE is the total amount of memory in use by a
class, MEMORY_COUNT is the total number of instances of a
class and CLASS_NAME is the name of the class. e.g.

26664400 562392 TOTAL
9361944 135484 [char
2201272 24938 [java.util.HashMap$Entry
1750704 72946 java.util.HashMap$Entry
1230624 51276 java.lang.String
733480 18337 java.util.HashMap
534328 11448 [java.lang.Object
296928 6186 org.apache.commons.collections.FastHashMap


Changelog

1.0 24th July 2006 Robert Maldon

Initial release version.

*/
#include
#include
#include
#include
#include

/* global jvmpi interface pointer */
static JavaVM *jvm;
static JVMPI_Interface *jvmpi_interface;
static JNIEnv *jni;
static jobjectID object;
static jobjectID clazz;
static int objectSize;
static const char *className;
static char buf[1024];
static char toString[64];

static jobjectID clazzCache;
static const char *nameCache;

#define OUTFILENAME_PREFIX "heapsummary."
static char outFileName[256];
static char *outFilePtr;

#define SORT_KEY_SIZE 0
#define SORT_KEY_COUNT 1
static int sortKey = SORT_KEY_SIZE;

/* START - hashtable */

#define HASH_KEY_MAX_SIZE 256
#define HASH_LIMIT 0.5

/* hash table top level data structure */
typedef struct hash_t {
struct hash_node_t **bucket; /* array of hash nodes */
int size; /* size of the array */
int entries; /* number of entries in table */
int downShift; /* shift count, used in hash function */
int mask; /* used to select bits for hashing */
} hash_t;

typedef struct hash_node_t {
long long memSize; /* total memory size of java class */
long long memCount; /* total number of instances of java class */
const char key[HASH_KEY_MAX_SIZE]; /* key for hash lookup - java class name */
struct hash_node_t *next; /* next node in hash chain */
struct hash_node_t *sortNext; /* next node in sort linked list */
} hash_node_t;

static hash_t *hashTable = NULL;

static hash_node_t *freeNodeList = NULL;

static hash_node_t *sortedList = NULL;

/* calculate the hash number for a given key */
static int hash(const char *key) {
int i = 0;
int hashValue;

while (*key != '\0') {
i = (i<<3) hashvalue =" (((i*1103515249)">> hashTable->downShift) & hashTable->mask);
if (hashValue < hashvalue =" 0;" buckets ="=" buckets =" 16;">entries = 0;
hashTable->size = 2;
hashTable->mask = 1;
hashTable->downShift = 29;

/* make sure buckets are a power of 2 */
while (hashTable->size <>size <<= 1; hashTable->mask = (hashTable->mask <<>downShift--;
}

/* allocate memory for the table */
hashTable->bucket = (hash_node_t **)calloc(hashTable->size, sizeof(hash_node_t *));

return;
}

/* create new hash table when the old one fills up */
static void hashTableRebuild() {
hash_node_t **oldBucket, *oldHash, *tmp;
int oldSize, h, i;

oldBucket = hashTable->bucket;
oldSize = hashTable->size;

/* create a new table and rehash the old buckets */
hashTableInit(oldSize << i =" 0;" oldhash =" oldBucket[i];" tmp =" oldHash;" oldhash =" oldHash-">next;
h = hash(tmp->key);
tmp->next = hashTable->bucket[h];
hashTable->bucket[h] = tmp;
hashTable->entries++;
}
}

/* free memory used by old table */
free(oldBucket);

return;
}

/* look for an entry of the given key in the hashtable */
static hash_node_t *hashTableLookup(const char *key) {
int h;
hash_node_t *node;

/* find the entry in the hash table */
h = hash(key);
for (node = hashTable->bucket[h]; node != NULL; node = node->next) {
if (!strcmp(node->key, key)) {
break;
}
}

/* return the entry if it exists, or NULL if it doesn't */
return node;
}

/* allocate a node for inserting into the hashtable. reuse a node from
the free list if available, otherwise malloc the space for the node */
static hash_node_t *hashTableAllocateNode() {
hash_node_t *node;

if (freeNodeList == NULL) {
node = (struct hash_node_t *) malloc(sizeof(hash_node_t));
} else {
node = freeNodeList;
freeNodeList = node->next;
}

memset(node, 0, sizeof(hash_node_t));
return node;
}

/* insert an entry into the hashtable */
static hash_node_t *hashTableInsert(const char *key, int memSize) {
hash_node_t *node;
int h;

/* if this was a generic hash table insert method then we would
first check to see if an entry that matches the given key
already exists. however, in the case of heapsummary we can
guarantee the key does not already exist in the hash table, so
for perf reasons we can bypass this check.
if ((node = hashTableLookup(key)) != NULL) {
return node;
}
*/

/* expand the table if needed */
while (hashTable->entries >= HASH_LIMIT * hashTable->size) {
hashTableRebuild();
}

/* insert the new entry */
h = hash(key);
node = hashTableAllocateNode();
node->memSize = memSize;
node->memCount = 1;
strncpy((char *)node->key, key, HASH_KEY_MAX_SIZE - 1);
node->next = hashTable->bucket[h];
hashTable->bucket[h] = node;
hashTable->entries++;

return node;
}

/* destroy the hash table, moving all of the nodes to a free list for
future reuse */
static void hashTableDestroy() {
hash_node_t *node, *next;
int i;

/* move the nodes to the free node list */
for (i = 0; i <>size; i++) {
next = hashTable->bucket[i];
while (next != NULL) {
node = next;
next = node->next;
node->next = freeNodeList;
freeNodeList = node;
}

/* null out the bucket pointer */
hashTable->bucket[i] = NULL;
}

hashTable->entries = 0;
}

/* sort the contents of the hashtable into a linked list */
static void hashTableSort() {
int i, greater;
hash_node_t *node;
hash_node_t *listNode;
hash_node_t *listPrev;

sortedList = NULL;

for (i = 0; i <>size; i++) {
for (node = hashTable->bucket[i]; node != NULL; node = node->next) {
if (sortedList == NULL) {
sortedList = node;
} else {
for (listNode = sortedList, listPrev = NULL; listNode != NULL;
listPrev = listNode, listNode = listNode->sortNext) {

if (sortKey == SORT_KEY_SIZE) {
if (node->memSize > listNode->memSize) {
greater = 1;
} else {
greater = 0;
}
} else {
if (node->memCount > listNode->memCount) {
greater = 1;
} else {
greater = 0;
}
}

if (greater) {
node->sortNext = listNode;
if (listNode == sortedList) {
sortedList = node;
} else {
listPrev->sortNext = node;
}
break;
}
}
if (listNode == NULL) {
listPrev->sortNext = node;
}
}
}
}
}

/* END - hashtable */

void loadClass() {
jvmpi_interface->RequestEvent(JVMPI_EVENT_OBJECT_ALLOC, object);
}

void loadClassName() {
if(clazz == NULL) className = "NULL?";
else if(clazz != clazzCache) {
className = NULL;
jvmpi_interface->RequestEvent(JVMPI_EVENT_CLASS_LOAD, clazz);
clazzCache = clazz;
nameCache = className;
}
}

jobjectID asObject(char *curr) {
char ptr[4];
memcpy(ptr, curr, 4);
return *((jobjectID*)ptr);
}

FILE *openSummaryFile() {
time_t tm;
struct tm *tmptr;

tm = time(NULL);
tmptr = localtime(&tm);
strftime(outFilePtr, sizeof(outFileName) - (outFilePtr - outFileName) - 1,
"%Y%m%d%H%M%S", tmptr);

return fopen(outFileName, "w");
}

/* function for handling event notification */
void notifyEvent(JVMPI_Event *event) {

JVMPI_HeapDumpArg arg;
char* curr;
long long totalCount;
long long totalSize;
int type;
hash_node_t *node;
FILE *summaryFile;

switch(event->event_type) {
case JVMPI_EVENT_CLASS_LOAD JVMPI_REQUESTED_EVENT:
className = event->u.class_load.class_name;
if (className == NULL) className = "NULL?";
break;
case JVMPI_EVENT_OBJECT_ALLOC JVMPI_REQUESTED_EVENT:
objectSize = event->u.obj_alloc.size;

switch (event->u.obj_alloc.is_array) {
case JVMPI_CHAR : className = "[char"; break;
case JVMPI_BOOLEAN : className = "[boolean"; break;
case JVMPI_FLOAT : className = "[float"; break;
case JVMPI_DOUBLE : className = "[double"; break;
case JVMPI_BYTE : className = "[byte"; break;
case JVMPI_SHORT : className = "[short"; break;
case JVMPI_INT : className = "[int"; break;
case JVMPI_LONG : className = "[long"; break;
case JVMPI_NORMAL_OBJECT: clazz = event->u.obj_alloc.class_id; loadClassName(); break;
case JVMPI_CLASS: clazz = event->u.obj_alloc.class_id; loadClassName(); sprintf(buf, "[%s", className); className = buf; clazzCache = NULL; break;
default :
sprintf(buf, "type %d ???", event->u.obj_alloc.is_array);
className = buf;
}

break;
case JVMPI_EVENT_DATA_DUMP_REQUEST:
case JVMPI_EVENT_JVM_SHUT_DOWN:
/* we do a full gc twice because, depending on the garbage collector
settings, the first gc may not collect all of the objects we
would like it to */
jvmpi_interface->RunGC();
sleep(1);
jvmpi_interface->RunGC();

(*jvm)->AttachCurrentThread(jvm, (void **)&jni, NULL);
arg.heap_dump_level = JVMPI_DUMP_LEVEL_0;
jvmpi_interface->RequestEvent(JVMPI_EVENT_HEAP_DUMP, &arg);

break;
case JVMPI_EVENT_HEAP_DUMP JVMPI_REQUESTED_EVENT: {
if ((summaryFile = openSummaryFile()) == NULL) {
fprintf(stderr, "heapsummary> Failed to open dump file %s\n",
outFileName);
return;
}

if (hashTable == NULL) {
hashTable = (struct hash_t *)malloc(sizeof(hash_t));
hashTableInit(512);
}

totalCount = 0;
totalSize = 0;

curr = event->u.heap_dump.begin;
while(curr <>u.heap_dump.end) {
/* if(((++totalCount)%1000) == 0) fprintf(stderr, "."); */
totalCount = totalCount + 1;
totalSize += objectSize;
type = *(curr++);
object = asObject(curr);
if(object == NULL) fprintf(stderr, "object null\n");
curr += 4;
loadClass();

node = hashTableLookup(className);
if (node != NULL) {
node->memSize = node->memSize + objectSize;
node->memCount = node->memCount + 1;
} else {
hashTableInsert(className, objectSize);
}
}

/* print totals */
fprintf(summaryFile, "%20lld%20lld TOTAL\n", totalSize, totalCount);

/* sort the elements of the hash table into a linked list, then
iterate through each item and print out its values */
hashTableSort();
for (node = sortedList; node != NULL; node = node->sortNext) {
fprintf(summaryFile, "%20lld%20lld %s\n",
node->memSize, node->memCount, node->key);
}

fclose(summaryFile);

hashTableDestroy();

break;
}
}
}

/* profiler agent entry point */
JNIEXPORT jint JNICALL JVM_OnLoad(JavaVM *theJvm, char *options, void *reserved) {
char *token;
jvmdiError err;

jvm = theJvm;
fprintf(stderr, "heapsummary> Initializing...\n");
err=(*jvm)->GetEnv(jvm,(void**)&jvmpi_interface, JVMPI_VERSION_1_1);
/* get jvmpi interface pointer */
if (err != 0) {
fprintf(stderr, "heapsummary> Error in obtaining jvmpi interface pointer\n");
return JNI_ERR;
}

jvmpi_interface->NotifyEvent = notifyEvent;
jvmpi_interface->EnableEvent(JVMPI_EVENT_DATA_DUMP_REQUEST, NULL);
jvmpi_interface->EnableEvent(JVMPI_EVENT_JVM_SHUT_DOWN, NULL);

*outFileName = NULL;
outFilePtr = outFileName;

for (token = strtok(options, ","); token != NULL; token = strtok(NULL, ",")) {
if (strncmp(token, "dir=", 4) == 0) {
strcpy(outFileName, token + 4);
outFilePtr = outFilePtr + strlen(outFileName);
#ifdef WINDOWS
*outFilePtr = '\\';
#else
*outFilePtr = '/';
#endif
outFilePtr = outFilePtr + 1;
} else if (strncmp(token, "sort=", 5) == 0) {
if (strcmp(token + 5, "size") == 0) {
sortKey = SORT_KEY_SIZE;
fprintf(stderr, "heapsummary> Sorting on memory size\n");
} else if (strcmp(token + 5, "count") == 0) {
sortKey = SORT_KEY_COUNT;
fprintf(stderr, "heapsummary> Sorting on memory count\n");
} else {
fprintf(stderr, "heapsummary> Unknown sort key [%s]. Defaulting to sorting on memory size\n", token + 5);
}
}
}

strcpy(outFilePtr, OUTFILENAME_PREFIX);
outFilePtr = outFilePtr + sizeof(OUTFILENAME_PREFIX) - 1;
strcpy(outFilePtr, "YYYYMMDDhhmmss");

fprintf(stderr, "heapsummary> Running. Will output to summary file %s\n",
outFileName);

return JNI_OK;
}

5 comments:

Anonymous said...

Nice! Good to see some useful C family code posted :)

Wolfie said...

Good article but strictly speaking, apart from the VM leaking or garbage collector not running correctly, its memory misappropriation or exhaustion really what you are describing.

Anonymous said...

Hi there,

Am using a singleton pattren to maintain my cache and in that object I load a hashmap for the first time from DB and am using that same object everytime there is a request. Now my Server memory is constantly building up and after analyzing the heap dumps, came to know the reason as HashMap which is not getting Garbase Collected. Occupying 98% of the Heap. Is something wrong with my way of using HashMap? I am sure I load it only once with probably abt 500 Objects and then stays there.
EMai: Lalith.Vaka@highmark.com / Lvaka@yahoo.com

Robert Maldon said...

Hi Lalith, I didn't get a reply from the email I sent you a couple of weeks ago so I'll assume you found the cause of your leaks :)

Anonymous said...

Hi Robert,

I'm currently implementing a JVMPI heap dump lib myself trying to chase down a memory leak and my c code unsurprisingly looks a look like yours.

The nice part of yours is the hash table implementation which if you have no objection i'd like to nab (with appropriate credit of course - being a Java programmer, C is not my strong point!). Unfortunately the blog seems to have done crazy things to the hash function (check out all the quotes). Do you by chance have the source still you could send? (mulcahyb@hotmail.com)

Btw - have you run this on Solaris? Wondered if you had the same fun and games I had with the JVMPI shutdown event getting fired twice leading to 2 heap dumps and java getting rather upset.

Many thanks,

Bri