Saturday, August 26, 2006

XML more compressed than MS Office binary format?

An interesting tidbit from an interview on the new "Ribbon" interface in Office 2007:

One likely incentive for that migration will be reduced storage costs. Microsoft claims that file sizes for the new Office 2007 XML-based formats are up to 75 percent less than existing Office formats.

How can Microsoft have been so inefficient with the previous Office binary formats? Or will the new XML formats be impossible for humans to read?

Thursday, August 24, 2006

Winning the War on Terror... through PowerPoint

An interesting blog about how the US military has recently been using PowerPoint slides to convey battle strategies instead of clearly written orders:

[Army Lt. General David] McKiernan had another, smaller but nagging issue: He couldn't get Franks to issue clear orders that stated explicitly what he wanted done, how he wanted to do it, and why. Rather, Franks passed along PowerPoint briefing slides that he had shown to Rumsfeld...

Here is one of the slides used to convey how the occupation phase of the Iraq conflict was expected to work:

Reminds me of The Gettysburg Powerpoint Presentation, a what-if Lincoln had access to PowerPoint when giving one of his most famous speeches.

Stargate SG-1 : End of an era

I was disappointed yesterday to see the news that the sci-fi channel will be cancelling Stargate SG-1 at the end of this record-breaking 10th season. I didn't get into SG-1 until I was unemployed for a few months in the wake of the 9/11 economic downturn, when I had the chance to watch SG-1 reruns and view both the good and the bad eprisodes.

I thought the battles with the new uber-enemy the Ori were developing into some good plots, but I guess they were running out of ideas.

At least I have the Peabody-winning Battlestar Galactica to keep my sci-fi interest going. (And the derivative Stargate Atlantis if nothing else is on :))

Friday, August 18, 2006

Finding Memory Leaks in Java : Using HeapSummary

One of the Java questions I like to ask at interviews is: Can you have a memory leak in Java? (or .NET, or any other language with built-in garbage collection). Surprisingly about 70% of people say no... What this can tell me about a candidate (I think) is that they have not worked on a large-scale, heavily used application.

For the record here are my thoughts on causes of leaks and how you track them down.

Symptoms of Leaks

In Java the main symptom is "OutOfMemoryError" exceptions in your log files, followed closely by the application dying.

Causes of Leaks

The most common causes of various memory probelms in Java are:
  1. Applications code that holds on to object references. e.g. the application keeps adding objects to global/singleton Maps or Lists and never removing these objects.
  2. An Application-level cache is misconfigured or does not behave as the developers expected.
  3. (More rare) An application server or third-party library has a leak for the same reasons as above.
  4. (Very rare) The C implementation of the JVM has a classic memory leak or suffers from fragmentation.

When a Leak is not a Leak

Sometimes a leak is not really a leak. Sometimes you just need to allocate more memory to your application than you initially guessed/estimated, particulalry if you have more users than expected. So how do you determine the write memory size and garbage collector settings for your application?

Application Load Testing

The best way to both size an application (hopefully before you release to production) and to track down any memory problems is to put the application under a realistic load. The key word in that sentence is realisitc, meaning the load should be approximately the same load pattern you expect to see in production. It's always frustrating to explain to people that you should probably expect an application to fall flat on its face if your first load test is to throw 1 million simultaneous requests at it. Instead you need to measure the application scalability, in throughput per second, at loads approximate to (both above and below, but around) the production load.

What tools can you use to generate load? The most common tools I've seen used are JMeter (my tools of choice, very easy to get up and running), The Grinder and LoadRunner. Those tools simulate client requests, but because they are not web browsers they can't interpret javascript, so may not be suitable to load test applications that are heavily DHTML based. For DHTML types of applications tools that drive a browser like Selenium and WinRunner are more suitable - but you will need a lot of client machines to generate a decent load.


To help detect a leak you will need the ability to profile your application.

The most basic information you need from a profiler is how much memory is being used by all of the classes in the JVM. If one class in particular, say a domain object called StockOption, keeps using up more and more memory as your load test continues (if there are more instances of the object that keep getting added to a Map then of course that class will use up more memory) then that is a good leak candidate to investigate.

A really useful profiler should have the ability to show you how instances of objects are related to each other ("heap walking") so you can trace where objects are allocated and where they are referenced. Only the commercial profilers like JProfiler and OptimizeIt have these features. I wish a decent open source profiler was available :(

The most basic memory leak can be found by:

  • Start the application
  • Record memory usage of classes (number of instances and amount of memory used)
  • Apply some load, then stop
  • Run a full GC (or two GCs, just to be sure)
  • Compare the class memory usage to before. The cause of the leak should be at or near the top of the list of most amount of memory being used.

One thing to note is that profilers tend to slow down the application by at least 10 times, so learn to be patient :)

Application Tuning

This is a really big topic. If I get the energy I'll tackle it in another blog entry.

Gathering Memory Statistics from Production : HeapSummary

One thing I like to recommend when dealing with production memory leaks is to get some basic heap information from the production systems themselves. Ideally you want to be able to reproduce memory leaks in a non-production system such as Staging or QA, but sometimes you just don't know which area of the code to start looking at or don't have enough hardware to mirror production problems. Doing anything with a production system makes some people nervous, but I do recommend it when you are short on time or not having luck reproducing the leak in a non-production environment.

So what are your options for gathering some stats from a production system? You could install a profiler, but that slows the application down 10 to 20 times, which is a no-no. You could use "monitoring tools" like Glassbox Inspector or Introscope that profile only selected classes - and therefore adds minimal overhead - but you could easily miss profiling the class that is the cause of the leak.

My preferred solution is to do a JVM heap dump shortly before the application dies with an OutOfMemoryError. The IBM JVM has a number of "dump heap on exit" features and tools to analyze those dumps - see the Diagnostic Guide for your version of the JVM. The Sun Java 5 JVM has a dump on exit feature, which has only very recenlty been backported to the most recent 1.4.2 patch (-XX:+HeapDumpOnOutOfMemoryError command line option added to the Sun 1.4.2_12 JVM), and very few tools to analyze those dumps (jhat doesn't work on dumps from all versions of the JVM).

One lightweight tool I developed to just dump summary information - total number of instances and total memory used for each class - I call HeapSummary. This is a small C module (see source code at end of this blog) that hooks into the JVM via the JVMPI interface. When you send a SIGQUIT/CTRL-Break signal to the JVM process it will dump a text file summary of the heap usage whose output looks something like this:

26664400 562392 TOTAL
9361944 135484 [char
2201272 24938 [java.util.HashMap$Entry
1750704 72946 java.util.HashMap$Entry
1230624 51276 java.lang.String
733480 18337 java.util.HashMap
534328 11448 [java.lang.Object
296928 6186 org.apache.commons.collections.FastHashMap

HeapSummary lays dormant until it receives the SIGQUIT signal, so it adds zero overhead during normal application use. Once it has been activated two full GCs are forced, the heap is "walked" to calculate instance and memory summaries (which can take four or five minutes for heaps of 2GB or greater), the summary file is written and the application goes back to processing as normal.

See the comments at the beginning of heapsummary.c for instructions on compiling and installing.

NOTE: I've only tested HeapSummary on Java 1.4 and 1.3 JVMs, not on Java 5 JVMs yet.

heapsummary.c (source code)


Copyright (c) 2006, Robert Maldon

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
See the License for the specific language governing permissions and
limitations under the License.

Based on the original code of heapprofile (
by Matthias Ernst


Compiling Win32 /Cygwin:
gcc -mno-cygwin -I$JAVA_HOME/include -I$JAVA_HOME/include/win32 -Wl,--add-stdcall-alias -shared -o heapsummary.dll heapsummary.c
Compiling Win32/Visual C++ (thanks to Jeppe Cramon ):
Create a Visual C++ Dynamic DLL project
Comment out #include
Add #define WIN32_LEAN_AND_MEAN and #include
Change sleep(5); to Sleep(5);
Compiling Win32/VC++ Toolkit (incomplete):
Execute %VC_HOME%\VCVARS32 to set up the command line
compilation environment, then compile with
cl -LDd -Zi -I. -I%JDK_HOME%\include -I%JDK_HOME%\include\win32 -Tcheapsummary.c -o heapsummary.dll

Compiling Solaris:
gcc -c -G -O -DUNIX -DSUN -ansi -fPIC -I$JAVA_HOME/include -I$JAVA_HOME/include/solaris heapsummary.c -o heapsummary.o
gcc -shared -o heapsummary.o
Compiling Linux:
gcc -o -shared -Wl,-soname, -I$JAVA_HOME/include -I$JAVA_HOME/include/linux heapsummary.c -static -lc
Compiling OS/X:
cc -bundle -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -I/System/Library/Frameworks/JavaVM.framework/Headers -o libheapsummary.jnilib -framework JavaVM heapsummary.c
Compiling HP/UX 11 PA/RISC:
cc -b -I$JAVA_HOME/include -I$JAVA_HOME/include/hp-ux -o heapsummary.c
Compiling HP/UX 11 Itanium 2 (for 32 bit) like above, only name the library .so instead of .sl

1) Make sure that the library is in your JVM path either by using
LD_LIBRARY_PATH under Solaris/Linux or PATH under Win32.
2) Load the JVM with the parameter -Xrunheapsummary (see optional
Usage below).




SUMMARY_DIR is the directory to write summary files to.
SUMMARY_DIR may be an absolute path (e.g. /tmp/summarylogs)
or relative to the JVM start directory (e.g. logs/summarylogs).
If the "dir" option is not specified then summary files will
be written to the JVM start directory.

The "sort" option can have a value of "size" (sorted by total
memory size of a class) or "count" (sorted by number of
instances of a class). If the sort option is not specified
then sorting will default to size.

The name of the summary file is of the format
"heapsummary.YYYYMMDDhhmmss" where YYYY is the year, MM is the
month, DD is the day, hh is the hour, mm is the minute and ss is
the second the summary file is created.

Generating a Heap Summary:

kill -QUIT
or CTRL-\ or CTRL-Break

Format of the Heap Summary File:


where MEMORY_SIZE is the total amount of memory in use by a
class, MEMORY_COUNT is the total number of instances of a
class and CLASS_NAME is the name of the class. e.g.

26664400 562392 TOTAL
9361944 135484 [char
2201272 24938 [java.util.HashMap$Entry
1750704 72946 java.util.HashMap$Entry
1230624 51276 java.lang.String
733480 18337 java.util.HashMap
534328 11448 [java.lang.Object
296928 6186 org.apache.commons.collections.FastHashMap


1.0 24th July 2006 Robert Maldon

Initial release version.


/* global jvmpi interface pointer */
static JavaVM *jvm;
static JVMPI_Interface *jvmpi_interface;
static JNIEnv *jni;
static jobjectID object;
static jobjectID clazz;
static int objectSize;
static const char *className;
static char buf[1024];
static char toString[64];

static jobjectID clazzCache;
static const char *nameCache;

#define OUTFILENAME_PREFIX "heapsummary."
static char outFileName[256];
static char *outFilePtr;

#define SORT_KEY_SIZE 0
#define SORT_KEY_COUNT 1
static int sortKey = SORT_KEY_SIZE;

/* START - hashtable */

#define HASH_KEY_MAX_SIZE 256
#define HASH_LIMIT 0.5

/* hash table top level data structure */
typedef struct hash_t {
struct hash_node_t **bucket; /* array of hash nodes */
int size; /* size of the array */
int entries; /* number of entries in table */
int downShift; /* shift count, used in hash function */
int mask; /* used to select bits for hashing */
} hash_t;

typedef struct hash_node_t {
long long memSize; /* total memory size of java class */
long long memCount; /* total number of instances of java class */
const char key[HASH_KEY_MAX_SIZE]; /* key for hash lookup - java class name */
struct hash_node_t *next; /* next node in hash chain */
struct hash_node_t *sortNext; /* next node in sort linked list */
} hash_node_t;

static hash_t *hashTable = NULL;

static hash_node_t *freeNodeList = NULL;

static hash_node_t *sortedList = NULL;

/* calculate the hash number for a given key */
static int hash(const char *key) {
int i = 0;
int hashValue;

while (*key != '\0') {
i = (i<<3) hashvalue =" (((i*1103515249)">> hashTable->downShift) & hashTable->mask);
if (hashValue < hashvalue =" 0;" buckets ="=" buckets =" 16;">entries = 0;
hashTable->size = 2;
hashTable->mask = 1;
hashTable->downShift = 29;

/* make sure buckets are a power of 2 */
while (hashTable->size <>size <<= 1; hashTable->mask = (hashTable->mask <<>downShift--;

/* allocate memory for the table */
hashTable->bucket = (hash_node_t **)calloc(hashTable->size, sizeof(hash_node_t *));


/* create new hash table when the old one fills up */
static void hashTableRebuild() {
hash_node_t **oldBucket, *oldHash, *tmp;
int oldSize, h, i;

oldBucket = hashTable->bucket;
oldSize = hashTable->size;

/* create a new table and rehash the old buckets */
hashTableInit(oldSize << i =" 0;" oldhash =" oldBucket[i];" tmp =" oldHash;" oldhash =" oldHash-">next;
h = hash(tmp->key);
tmp->next = hashTable->bucket[h];
hashTable->bucket[h] = tmp;

/* free memory used by old table */


/* look for an entry of the given key in the hashtable */
static hash_node_t *hashTableLookup(const char *key) {
int h;
hash_node_t *node;

/* find the entry in the hash table */
h = hash(key);
for (node = hashTable->bucket[h]; node != NULL; node = node->next) {
if (!strcmp(node->key, key)) {

/* return the entry if it exists, or NULL if it doesn't */
return node;

/* allocate a node for inserting into the hashtable. reuse a node from
the free list if available, otherwise malloc the space for the node */
static hash_node_t *hashTableAllocateNode() {
hash_node_t *node;

if (freeNodeList == NULL) {
node = (struct hash_node_t *) malloc(sizeof(hash_node_t));
} else {
node = freeNodeList;
freeNodeList = node->next;

memset(node, 0, sizeof(hash_node_t));
return node;

/* insert an entry into the hashtable */
static hash_node_t *hashTableInsert(const char *key, int memSize) {
hash_node_t *node;
int h;

/* if this was a generic hash table insert method then we would
first check to see if an entry that matches the given key
already exists. however, in the case of heapsummary we can
guarantee the key does not already exist in the hash table, so
for perf reasons we can bypass this check.
if ((node = hashTableLookup(key)) != NULL) {
return node;

/* expand the table if needed */
while (hashTable->entries >= HASH_LIMIT * hashTable->size) {

/* insert the new entry */
h = hash(key);
node = hashTableAllocateNode();
node->memSize = memSize;
node->memCount = 1;
strncpy((char *)node->key, key, HASH_KEY_MAX_SIZE - 1);
node->next = hashTable->bucket[h];
hashTable->bucket[h] = node;

return node;

/* destroy the hash table, moving all of the nodes to a free list for
future reuse */
static void hashTableDestroy() {
hash_node_t *node, *next;
int i;

/* move the nodes to the free node list */
for (i = 0; i <>size; i++) {
next = hashTable->bucket[i];
while (next != NULL) {
node = next;
next = node->next;
node->next = freeNodeList;
freeNodeList = node;

/* null out the bucket pointer */
hashTable->bucket[i] = NULL;

hashTable->entries = 0;

/* sort the contents of the hashtable into a linked list */
static void hashTableSort() {
int i, greater;
hash_node_t *node;
hash_node_t *listNode;
hash_node_t *listPrev;

sortedList = NULL;

for (i = 0; i <>size; i++) {
for (node = hashTable->bucket[i]; node != NULL; node = node->next) {
if (sortedList == NULL) {
sortedList = node;
} else {
for (listNode = sortedList, listPrev = NULL; listNode != NULL;
listPrev = listNode, listNode = listNode->sortNext) {

if (sortKey == SORT_KEY_SIZE) {
if (node->memSize > listNode->memSize) {
greater = 1;
} else {
greater = 0;
} else {
if (node->memCount > listNode->memCount) {
greater = 1;
} else {
greater = 0;

if (greater) {
node->sortNext = listNode;
if (listNode == sortedList) {
sortedList = node;
} else {
listPrev->sortNext = node;
if (listNode == NULL) {
listPrev->sortNext = node;

/* END - hashtable */

void loadClass() {
jvmpi_interface->RequestEvent(JVMPI_EVENT_OBJECT_ALLOC, object);

void loadClassName() {
if(clazz == NULL) className = "NULL?";
else if(clazz != clazzCache) {
className = NULL;
jvmpi_interface->RequestEvent(JVMPI_EVENT_CLASS_LOAD, clazz);
clazzCache = clazz;
nameCache = className;

jobjectID asObject(char *curr) {
char ptr[4];
memcpy(ptr, curr, 4);
return *((jobjectID*)ptr);

FILE *openSummaryFile() {
time_t tm;
struct tm *tmptr;

tm = time(NULL);
tmptr = localtime(&tm);
strftime(outFilePtr, sizeof(outFileName) - (outFilePtr - outFileName) - 1,
"%Y%m%d%H%M%S", tmptr);

return fopen(outFileName, "w");

/* function for handling event notification */
void notifyEvent(JVMPI_Event *event) {

JVMPI_HeapDumpArg arg;
char* curr;
long long totalCount;
long long totalSize;
int type;
hash_node_t *node;
FILE *summaryFile;

switch(event->event_type) {
className = event->u.class_load.class_name;
if (className == NULL) className = "NULL?";
objectSize = event->u.obj_alloc.size;

switch (event->u.obj_alloc.is_array) {
case JVMPI_CHAR : className = "[char"; break;
case JVMPI_BOOLEAN : className = "[boolean"; break;
case JVMPI_FLOAT : className = "[float"; break;
case JVMPI_DOUBLE : className = "[double"; break;
case JVMPI_BYTE : className = "[byte"; break;
case JVMPI_SHORT : className = "[short"; break;
case JVMPI_INT : className = "[int"; break;
case JVMPI_LONG : className = "[long"; break;
case JVMPI_NORMAL_OBJECT: clazz = event->u.obj_alloc.class_id; loadClassName(); break;
case JVMPI_CLASS: clazz = event->u.obj_alloc.class_id; loadClassName(); sprintf(buf, "[%s", className); className = buf; clazzCache = NULL; break;
default :
sprintf(buf, "type %d ???", event->u.obj_alloc.is_array);
className = buf;

/* we do a full gc twice because, depending on the garbage collector
settings, the first gc may not collect all of the objects we
would like it to */

(*jvm)->AttachCurrentThread(jvm, (void **)&jni, NULL);
arg.heap_dump_level = JVMPI_DUMP_LEVEL_0;
jvmpi_interface->RequestEvent(JVMPI_EVENT_HEAP_DUMP, &arg);

if ((summaryFile = openSummaryFile()) == NULL) {
fprintf(stderr, "heapsummary> Failed to open dump file %s\n",

if (hashTable == NULL) {
hashTable = (struct hash_t *)malloc(sizeof(hash_t));

totalCount = 0;
totalSize = 0;

curr = event->u.heap_dump.begin;
while(curr <>u.heap_dump.end) {
/* if(((++totalCount)%1000) == 0) fprintf(stderr, "."); */
totalCount = totalCount + 1;
totalSize += objectSize;
type = *(curr++);
object = asObject(curr);
if(object == NULL) fprintf(stderr, "object null\n");
curr += 4;

node = hashTableLookup(className);
if (node != NULL) {
node->memSize = node->memSize + objectSize;
node->memCount = node->memCount + 1;
} else {
hashTableInsert(className, objectSize);

/* print totals */
fprintf(summaryFile, "%20lld%20lld TOTAL\n", totalSize, totalCount);

/* sort the elements of the hash table into a linked list, then
iterate through each item and print out its values */
for (node = sortedList; node != NULL; node = node->sortNext) {
fprintf(summaryFile, "%20lld%20lld %s\n",
node->memSize, node->memCount, node->key);




/* profiler agent entry point */
JNIEXPORT jint JNICALL JVM_OnLoad(JavaVM *theJvm, char *options, void *reserved) {
char *token;
jvmdiError err;

jvm = theJvm;
fprintf(stderr, "heapsummary> Initializing...\n");
err=(*jvm)->GetEnv(jvm,(void**)&jvmpi_interface, JVMPI_VERSION_1_1);
/* get jvmpi interface pointer */
if (err != 0) {
fprintf(stderr, "heapsummary> Error in obtaining jvmpi interface pointer\n");
return JNI_ERR;

jvmpi_interface->NotifyEvent = notifyEvent;
jvmpi_interface->EnableEvent(JVMPI_EVENT_DATA_DUMP_REQUEST, NULL);
jvmpi_interface->EnableEvent(JVMPI_EVENT_JVM_SHUT_DOWN, NULL);

*outFileName = NULL;
outFilePtr = outFileName;

for (token = strtok(options, ","); token != NULL; token = strtok(NULL, ",")) {
if (strncmp(token, "dir=", 4) == 0) {
strcpy(outFileName, token + 4);
outFilePtr = outFilePtr + strlen(outFileName);
#ifdef WINDOWS
*outFilePtr = '\\';
*outFilePtr = '/';
outFilePtr = outFilePtr + 1;
} else if (strncmp(token, "sort=", 5) == 0) {
if (strcmp(token + 5, "size") == 0) {
sortKey = SORT_KEY_SIZE;
fprintf(stderr, "heapsummary> Sorting on memory size\n");
} else if (strcmp(token + 5, "count") == 0) {
fprintf(stderr, "heapsummary> Sorting on memory count\n");
} else {
fprintf(stderr, "heapsummary> Unknown sort key [%s]. Defaulting to sorting on memory size\n", token + 5);

strcpy(outFilePtr, OUTFILENAME_PREFIX);
outFilePtr = outFilePtr + sizeof(OUTFILENAME_PREFIX) - 1;
strcpy(outFilePtr, "YYYYMMDDhhmmss");

fprintf(stderr, "heapsummary> Running. Will output to summary file %s\n",

return JNI_OK;

Wednesday, August 16, 2006

VB coming to a JVM real soon now

Following the rise (at least in hype terms) of dynamic languages like Groovy and JRuby for the JVM, Project Simplice (pronounced Sim-pleah-chee) promises to make Visual Basic run on the JVM. Sort of (see the gotchas). Scary... (and too late)

Thursday, August 10, 2006


A good summary of the new Web 2.0 craze of 'mashups':

A mashup Web site is characterized by the way in which it spreads roots across the Web, drawing upon content and functionality retrieved from data sources that lay outside of its organizational boundaries...

Programmable Web indexes a number of interesting mashups, including:

  • Stock Cloud is a tag cloud-like display of stock ticker symbols. The larger the ticker symbol the more frequent that company distributes press releases.
  • Venture Capital funding in Silicon Valley.
  • Bogozo integrates CraigsList real estate data with GoogleMaps.
  • DrinkDeal finds NYC drink specials by neighborhood and day.
  • AntennaSearch tells you where cell phone towers are, including future towers (so you can move your family away from them!).
  • Goggles is a web-based flight simulator built on top of Google Maps.

Now if only someone would create a mashup for locating hedge funds :)

Saturday, August 05, 2006

Hacking Gmail

Last weekend I had a quick flick through Hacking Gmail at my local Barnes and Noble. Most of the book focuses on building client applications in PHP/Python/Perl to do interesting stuff like download your email and archive it in a standard format like mbox format, but it contained a few other quick tips worth noting:
  • Gmail uses the syntax to sort and filter emails (I'de forgot that syntax was still in use!).
  • Many of the Gmail client APIs simply invoke HTTP requests. This works because Gmail has a non-ajax interface - just disable javascript in your browser and see for yourself!
  • Gmail actually supports https for all mail operations, not just login. The catch is that all of the URLs returned from gmail are http URLs, so even if you started your session with https you'll soon find you're back to http. You can force an https session in Firefox by first installing the Greasemonkey plugin - which lets you to add bits of DHTML ("user scripts") to any web page to change its behavior - and this one line script that changes all http URLs to https.
  • Instead of using one of the gazillion RSS readers available on the net, you can use a service like to email publications to your inbox.

Lunch places in Midtown

midtownlunch is a great blog on places to get lunch, from the street meat carts to the expensive restaurants, in the heart of Manhattan. I've been looking for this type of info for a long time. Thanks Metro!

Friday, August 04, 2006

Matt Damon is the new Captain Kirk?

Slashdot is reporting that Matt Damon will be the young Kirk in the upcoming StarTrek XI film. Shattner is not pleased...

Thursday, August 03, 2006

MSDN library is now free

Microsoft are now giving away MSDN Library, which contains doco and examples for many core development tools, for free!

I wonder what the motivation is? Are those tools now no longer part of the core product set, and .NET is the future? (Although Vista is built mostly in C and Visual C++ is now free.) Or is Microsoft trying to compete against the open source movement where so much intellectual property is given away for free? (e.g. IBM donated lots of stuff to Eclipse and Apache, Intel donated a large chunk of core JVM code to Harmony, Hibernate and Spring have reasonably extensive free doco.)

Wednesday, August 02, 2006

Quick inlining of C code in a Ruby script

A Ruby module called RubyInline allows you to do a really groovy thing: insert a small snippet of C code in your script, which is then compiled on the fly!

As an example, lets say you have the following Ruby script that calculates prime numbers (example from this blog post):


for num in 1..10_000 do
is_prime = 1
for x in 2..(num - 1) do
if (num % x == 0)
is_prime = x
if is_prime == 1
puts "#{num} is a prime number"
puts "#{num} equals #{is_prime} * #{num/is_prime}"

You can replace a chunk of this with "inlined C" by doing something like the following:


require "rubygems"
require "inline"

class Primes
inline do |builder|
builder.c '
int prime(int num) {
int x;
for (x = 2; x < (num - 1) ; x++) {
if (num == 2) {
return 1;
if (num % x == 0) {
return x;
return 1;

p =

for num in 2..10_000 do
is_prime =
if is_prime == 1
puts "#{num} is a prime number"
puts "#{num} equals #{is_prime} * #{num/is_prime}"

Really cool. Saves having to write whole libraries/modules just to access a few C methods. I wish Java and .NET had the ability to inline small snippets of other languages if required.

Reminds me of that old geek classic "Write in C" (sung to the tune of the Beatle's song "Let it Be"):

When I find my code in tons of trouble,
Friends and colleagues come to me, Speaking words of wisdom:
"Write in C."

As the deadline fast approaches,
And bugs are all that I can see,
Somewhere, someone whispers:
"Write in C."...