Optimizing Printf for Java Calls

Printf formatting is often used in loops to produce neatly formatted tables. Sharkysoft's Printf for Java was designed with the ability to accelerate printf formatting within loops. To demonstrate the many ways in which Printf for Java can be optimized, we'll examine a simple program that outputs an ASCII table. Each of the versions shown below produces exactly the same output.

asciitable_1

This first example shows the easiest way to use printf in Java.

import lava.clib.Stdio;
import lava.clib.stdarg.Va_list;

class asciitable_1
{

	public static void main (String[] args)
	{
		for (int i = 0; i < 256; ++ i)
			Stdio.printf
			(
				"%#10c%+10d%10.4o%#10.2x\n",
				new Va_list ()
				. add ((char) i)
				. add (i)
				. add (i)
				. add (i)
			);
	}

}

Note that printf is a method of the Stdio class, reminding us of the fact that in C, printf was part of the stdio library. Thus, before you can use printf in Java, you must first import the Stdio class. Note that this feels remarkably similar to including the "stdio.h" header file in a C program.

Since Java does not support variable argument lists, a special class, Va_list, is used to emulate them. This class is analogous to the va_list macro (in the stdarg library) used to implement variable argument lists in C. (In fact, most C implementations of va_list use this macro.) Va_list's add method is overloaded for all of the primitive Java types. The completion of the argument list is (optionally) indicated by calling the done() method.

asciitable_2

We're just kidding ourselves. We all know that no matter how you dress it up, in Java an untyped argument list is nothing more than a vector of Objects. In fact, I'll confess now that Va_list uses java.util.Vector in its implementation. In the above example, each add call appends its arguments to the list, and the printf method automatically converts the vector into an Object array.

So let's skip the overhead of building an Object array one element at a time and do it the fast way instead:

import lava.clib.Stdio;

class asciitable_2
{

	public static void main (String[] args)
	{
		for (int i = 0; i < 256; ++ i)
			Stdio.printf
			(
				"%#10c%+10d%10.4o%#10.2x\n",
				new Object[]
				{
					new Character ((char) i),
					new Integer (i),
					new Integer (i),
					new Integer (i)
				}
			);
	}

}

Pay careful attention to the syntax used here to build the argument list. Not only does this syntax cause the Java compiler automatically chooses the correct length for the Object array (in this case, 4), but it also populates the array with the 4 objects. This approach requires significantly less processing overhead than Va_list or Vector, because the final array length is known ahead of time (eliminating the need for dynamic array growth) and because there is no need to make a copy of the vector when you are done.

asciitable_3

Here's another handly trick. Since the last three parameters to the argument list are the same value, why waste time creating three identical objects? Instead, let's optimize the loop further by creating just one Integer object, and then reusing it to populate the array:

import lava.clib.Stdio;

class asciitable_3
{
	public static void main (String[] args)
	{
		for (int n = 0; n < 256; ++ n)
		{
			Integer i = new Integer (n);
			Stdio.printf
			(
				"%#10c%+10d%10.4o%#10.2x\n",
				new Object[]
				{
					new Character ((char) n),
					i,
					i,
					i
				}
			);
		}
	}
}

This optimization causes 2 less objects to be created every iteration.

asciitable_4

Before printf can produce output, the format string must be parsed. In the above examples, the format string must be parsed once for each pass through the loop. What a waste of time! Why not just parse the format string outside the loop, and then reuse the same parse results inside the loop? This would certainly be an optimization. In Lava, printf format strings can be "pre-parsed" by constructing instances of PrintfFormatString. When PrintfFormatString objects are used in place of a regular String objects for format strings, the resulting printf operation is always faster. The PrintfFormatStrings are like ready-to-go "formatting engines." Here's what the revised code looks like:

import lava.clib.Stdio;
import lava.clib.stdio.PrintfFormatString;

class asciitable_4
{

	public static void main (String[] args)
	{
		PrintfFormatString fmt =
			new PrintfFormatString ("%#10c%+10d%10.4o%#10.2x\n");
		for (int n = 0; n < 256; ++ n)
		{
			Integer i = new Integer (n);
			Stdio.printf
			(
				fmt,
				new Object[]
				{
					new Character ((char) n),
					i,
					i,
					i
				}
			);
		}
	}

}

This little trick saves us the overhead of parsing the format string with each iteration through the loop. Whereever printf is used in a loop or in a method that is called repeatedly, the programmer should consider storing the pre-initialized format string as a private static variable in the class. This way it will be initialized just once, when the classes is loaded, and ready for use each time it is needed. It's too bad the C version of printf didn't have this feature.

asciitable_5

You may think that we have completed all the optimization one could hope for, but would you believe there's more? Most C programmers are unfamiliar with the fact that printf has a return value. The ANSI C specification of printf declares that in normal circumstances, the value returned by printf shall be the total number of characters formatted.

So what does this have to do with speeding up asciitable? Well, the relationship is indirect. You see, Stdio.printf doesn't really do any work. Instead, it delegates the job to another method, which performs the formatting and returns the results in a String. The Stdio.printf method simply writes the String to System.out and returns the String's length.

In our case, however, we don't really care about counting characters, so we can bypass the syntactic sugar of Stdio.printf and call the formatting routine ourselves.

The main printf formatting workhorse is -- you guessed it -- the Printf class (that's right, I said "class," not "method"). The Printf class has just one (overloaded) method, format. Like Stdio.printf, when you call it you must supply a format and arguments. Unlike Stdio.printf, however, Printf.format does not produce stream output; it only returns the formatted string. What you do with it when you get it is your own business, not Printf's. Let's have a look:

import lava.clib.stdio.Printf;
import lava.clib.stdio.PrintfFormatString;

class asciitable_5
{

	public static void main (String[] args)
	{
		PrintfFormatString fmt =
			new PrintfFormatString ("%#10c%+10d%10.4o%#10.2x\n");
		for (int n = 0; n < 256; ++ n)
		{
			Integer i = new Integer (n);
			System.out . print
			(
				Printf.format
				(
					fmt,
					new Object[]
					{
						new Character ((char) n),
						i,
						i,
						i
					}
				)
			);
		}
	}

}

You see what's happening here? We're taking the formatted String returned by format and passing it to System.out.print for printing, bypassing the call to Stdio.printf entirely.

conclusion

Compare asciitable_1 to asciitable_5 and see how much our code has evolved. The first version was syntactically sweet, but the price for this sugar was unnecessary processing overhead at runtime (burning more calories, to keep the analogy). The final version is not as pleasant to look at (health food), and requires significantly more labor from the programmer (gardening), but the optimizing techniques employed result in much faster output (OK, analogies fail me here).

A simple benchmark I ran, based on these examples, indicates that printf calls complete approximately 32% faster when the optimization techniques of asciitable_5 are used. For small formatting jobs, this speedup may not be worth the extra coding effort. However, if your program uses printf to produce large, formatted tables that are thousands of lines longs, this 32% speedup may be worth the effort.

©1998-2001 Sharkysoft. Comments on this article are welcome.