Printf for Java Specification 3.0

Sharkysoft's Printf for Java specification offers Java programmers all the robust text- and data-formatting features of the C programming language's printf function and more. Printf for Java allows the programmer to format data into strings using customizable formatting templates. This specification assumes that the reader is already familiar with C's printf function.

1. data vectors

In C, printf calls are made using variable argument lists, or parameter lists of arbitrary length and type. In Java, however, variable parameter lists are not allowed (yet). To overcome this limitation, Printf for Java accepts an Object array in place of the variable argument list. Throughout this specification, this structure is referred to as the data vector.

The printf data vector may be any length and contain any object type, as long as its contents are compatible with the data types indicated in the the format string. Because the data vector is implemented as an Object array, elements represeting primitive types, such as ints, must be wrapped in compatible objects to be included in the vector. The primitive types and their corresponding wrappers are listed below.

primitive type wrapping class
boolean java.lang.Boolean
byte java.lang.Byte
short java.lang.Short
int java.lang.Integer
long java.lang.Long
float java.lang.Float
double java.lang.Double
char java.lang.Character

2. format strings

A format string is a concatenated mixture of literal strings and format specifiers. Literal strings are copied verbatim to the formatted output. Format specifiers describe conversion operations (stringification) which are usually applied to elements in the data vector. All printf format strings that are valid (and sensible) in C are also valid in Printf for Java, and can be expected to perform the same function. However, Printf for Java adds a few additional capabilities which are not available in C-based implementations. These new capabilities are indicated in highlighted text.

3. format specifiers

A format specifier has the following syntax:

%[flags][width][.precision][input_size]conversion_type

The percent character ('%') always signals the beginning of a format specifier. The conversion type character is always present and marks the end of the format specifier. The flags, width, precision, and input size are optional, but if they are present, they must appear between the percent character and conversion type character, in the order given above.

4. input conversion types

The conversion type character indicates the type of input data to be stringified and formatted. There are several input conversion types, one output conversion type, and two escape conversion types. An input conversion type consumes at least one element from the data vector and generates formatted data based on the input value. The complete list of input conversion types is shown below. For convenience, the flags permitted are also shown, though these are not discussed until later.

conversion type character default input type stringified as flags permitted
%c char Character, possibly encoded. '-', '^', '#'
%d int Signed decimal integer. '-', '^', '0', '+', ' '
%e, %E float Real number, scientific notation (lowercase or uppercase exponent marker). '-', '^', '#'
%f float Real number, standard notation. '-', '^', '0', '+', ' ', '#'
%g, %G float Same as %f or %e, depending on value. Scientific notation is used only if the exponent is greater than the precision or less than -4. '-', '^', '0', '#'
%o int Unsigned octal integer. '-', '^', '0', '#'
%p Object Object identity hash code (i.e., the object's address), in unsigned hexadecimal. '-', '^', '0',
%s String String. '-', '^', '#'
%u int Unsigned decimal integer. '-', '^', '0'
%x or %X int Unsigned hexadecimal integer, lowercase or uppercase. '-', '^', '0', '#'
%z[n] or %Z[n] int Unsigned integer in base n (decimal), lowercase or uppercase. (The square brackets are part of the specifier.) '-', '^', '0',

The other conversion types are described later.

5. input size modifiers

Input conversion type specifiers can be preceded by an input size modifier to override the default input type. All supported input size modifications are listed below:

default input type input size modifier modified data type
int (none) int
b byte
B BigInteger
h short
l long
float (none) float
B BigDecimal
l double

6. output field width specifiers

An optional output field width specifier, if present, specifies the minimum output field width, or the minimum number of characters that the formatted data will span in the output. If the stringified value does not fill the whole field, then the field will be padded. The default behavior, in this case, is to right-align the output in the field by left-padding it with spaces.

On the other hand, if the formatted value exceeds the minimum length of the field, the output will not be truncated, and the field will be widened as necessary to display the entire result. When no output width is specified, there is no minimum field width. In this case, the field will only be as wide as necessary to display the result.

Example:

The code

Printf.out("%6d", new Object[] {new Integer(52)});

will output

"    52"

(with four spaces on the left).

7. precision specifiers

A precision specifier, if present, controls the precision with which the input data will be converted. The effect of setting a precision on a value depends on the conversion type. The effects of precision for each input type, along with the default precision values, are given below:

conversion type effect of precision default precision
Integer conversion:
  • %d
  • %o
  • %u
  • %x, %X
  • %z[n], %Z[n]
Precision controls the minimum number of digits. The converted value will be prepended with zeros if necessary. If the precision is 0 and the input value is zero, then:
  • If no output width is specified, nothing will be output.
  • If an output width is specified, the entire field will be nothing but padding.
1
Real conversion:
  • %f
  • %e, %E
  • %g, %G
Precision controls the number of fractional digits after the decimal point. The converted value will be rounded if necessary. 6
String conversion:
  • %s
Precision controls the maximum number of characters from the input that will be displayed. If the string is longer than the precision, it will be truncated. infinity

Precision cannot be specified for types not listed in the table above. If no precision is specified, then the default precision will be used.

8. variable width and precision

Field widths and field precisions can be specified with an asterisk ('*') to indicate that these values should be obtained from the data vector. Each asterisk in a format specifier consumes an int from the data vector, in the same order as it appears in the format specifier. The actual data to be converted is consumed last.

Example:

If the format specifier is "%*hd", then there must be two elements in the data array corresponding to this format specifier. The first element is an int specifying the field width, and the second element is a short that will be rendered as a signed decimal string.

9. flags

Flags are single characters that indicate exceptions to the conversion type's default formatting behavior. A format specifier may have multiple flags, but some flags are mutually exclusive. Multiple flags can appear in any order. The following table lists all of the formatting flags supported by Printf for Java:

flag effect applicable conversion types
'-' Output will be left-aligned in field. This flag is meaningless if no field width is specified. %c, %d, %e, %E, %f, %g, %G, %o, %p, %s, %u, %x, %X, %z[n], %Z[n],
'^' Output will be centered in field. This flag is meaningless if no field width is specified.
'0' Field will be padded with leading zeros, inserted between sign character, if any, and value. %d, %e, %E, %f, %g, %G, %o, %p, %u, %x, %X, %z[n], %Z[n]
'+' Non-negative values will begin with a plus character ('+'). %d, %f
' ' Non-negative values will begin with a space character (' ').
'#' Data will be represented in an "alternate form." This depends on the conversion type:
Non-negative octal values will begin with a zero ('0'). %o
Hexadecimal values will begin with "0x" or "0X" (depending on case of conversion type character). %x, %X
The integer portion of the result will end with a decimal point ('.'), even if the fractional portion is zero. %e, %E, %f
The fractional portion always appears, even if it is zero. %g, %G
If the character is special or unprintable, it will be output in escaped form. The output can be surrounded by single quotes to form a syntactically valid Java character literal. %c

10. literal percent characters

In format strings, the percent character ('%') normally signals the beginning of a conversion type specifier. This can cause difficulties, however, if you actually want to output a percent character. To solve this problem, an escaping mechanism is used. To output a single percent character, just embed two percent characters in the format string. This conversion type does not consume input from the data vector.

11. platform-platform-neutral line separators

It is sometimes convenient to embed line separators in format strings. However, line separators vary from platform to platform. To avoid platform-specific line separators, use "%\n" instead of "\n", "\r\n", or "\r". Printf for Java will automatically output the host platform's native line separator. This conversion type does not consume input from the data vector.

Note: If platform-independence is not a concern, and the software will run primarily on Unix- and Windows-based hosts, the traditional newline character '\n' will probably produce acceptable results, with slightly better performance.

12. character counting

The character count conversion type, "%n", does not format data or generate formatted output. Instead, when this format specifier appears in the format string, the number of characters formatted up to that point is counted, and the resulting Integer value is written into the data vector. Thus, when building the data vector for a format string containing this conversion type, a slot must be reserved for the result. The original contents of the slot are ignored and overwritten.

Example:

This program prints the string "My friend Sharky is 27 years old." It uses %n to obtain character positions so that it can underline the number 27.

// These are the parameters:
String name = "Sharky";
int age = 27;
// This is the parameter list.  We're leaving empty (null) slots where the %n 
// conversion type will write its results:
Object[] params = new Object[]
{   name, 
    null, // filled in by %n
    new Integer(age), 
    null  // filled in by %n
};
// Now we'll generate the first line of output:
Printf.out("My friend %s is %n%d%n years old.%\n", params);
// Next, we'll use the feedback from %n to output the underline:
int start = ((Integer) params[1]).intValue();
int stop  = ((Integer) params[3]).intValue();
for (int pos = 0; pos < start; ++ pos)
    Printf.out(" ");
for (int pos = start; pos < stop; ++ pos)
    Printf.out("-");

Results:

My friend Sharky is 27 years old.
                    --

13. abstract data conversion

Most format specifiers are not picky about the actual data types furnished in the data vector, as long as they are instances of the required base class. Printf for Java will utilizes the object's own conversion methods to obtain input data of the correct type. The required class for each input type, and conversion methods used, are given in the table below:

input type recommended data type required data type conversion method
boolean java.lang.Boolean java.lang.Boolean booleanValue()
byte java.lang.Byte java.lang.Number byteValue()
short java.lang.Short shortValue()
int java.lang.Integer intValue()
long java.lang.Long longValue()
float java.lang.Float floatValue()
double java.lang.Double doubleValue()
char java.lang.Character java.lang.Character charValue()
java.math.BigInteger java.math.BigInteger java.math.BigInteger n/a
java.math.BigDecimal java.math.BigDecimal java.math.BigDecimal n/a
java.lang.String java.lang.String java.lang.Object toString()

With these capabilities, the following observations can be made:

14. unsigned BigInteger conversion

The %u, %x, and %o format specifiers typically represent unsigned integer conversions. This is conceivable with primitive data types, such as int, because the bit widths of primitive types are fixed and known. By definition, however, a BigInteger, represents an "infinitely long" bit vector, so it is not possible to treat a BigInteger as an unsigned value. Because of this, the format specifiers %Bx, %Bo, and %Bu produce undefined results when the input values are negative. (Good luck.)

15. nonsensical format strings

With all of the formatting options that are available in this specification, it should not be surprising that some options are mutually exclusive. Printf for Java validates each format string and rejects it if conflicting or incompatible options are discovered. A few of the conditions which might cause a printf format string to be rejected are listed below, with examples:

Most C implementations of printf silent ignore errors and generate unpredictable results without complaining. Printf for Java, however, is not so forgiving. Printf for Java rejects ambiguous format strings by throwing an exception, and hence a little more discipline is required from programmer. We think this is a good thing. :-)

©1998-2004 Sharkysoft. All rights reserved.