jagg - DoubleDouble Precision

Floating Point Calculations

Most of jAgg's Aggregators perform calculations on numeric values. These values are generally represented as the Java built-in type double, which represents a wide range of possible values. However, many numbers cannot be represented exactly, due to the binary representation of floating-point numbers. Many numbers can only be approximated, using the binary representation that is the closest possible value to the number desired.

These approximations are usually insignificant, but they become problematic in a few ways. When an intermediate result is used to calculate the next result, usually in iterations involving a large set of figures, the error resulting from the approximations can become more signficant, i.e. floating-point errors are compounded. In addition, numerical instabilities in normally sound mathematical algorithms can render floating-point accuracy non-existent, such as when two nearly equal quantities are subtracted.

However, because jAgg supports arbitrarily large datasets to aggregate, these approximation errors can easily become significant. This can manifest itself in an answer that is "off" by a little bit. E.g. the answer should be 2.18, but the printed answer is 2.180000000000001.

Such errors can be minimized by using in-memory representations of numbers that have a higher precision. Java has two levels of precision for floating-point numbers, float and double, with double having more precision, and thus a far smaller approximation error. Arbitrary-precision libraries are available that can virtually eliminate these kinds of errors. However, they can suffer from large peformance penalties, especially when a high degree of precision is desired.

DoubleDouble Solution

The solution that jAgg employs is to use "Double-Double" precision. A DoubleDouble is an object that consists of two double instance variables - one of "high" significance, and one of "low" significance. Normally, one double contains 52 bits in its "signficand", plus the implicit "one", yielding 53 bits of precision. In a DoubleDouble, the "high" double represents the best double approximation to the desired number. To improve precision, the "low" double represents an adjustment to the total value of the "high" number, with 54 additional bits of precision (52 bits of the significand, plus its implicit "one" and the sign bit is also used here), yielding a total of 107 bits of precision. This greatly reduces, but does not eliminate, the problem of binary approximation of real numbers. However, in testing, this appears to be sufficient to yield highly accurate and precise double results for aggregation tasks.

All Aggregators that use numeric calculations employ DoubleDoubles behind the scenes to maintain a high level of precision and to minimize floating-point errors in calculations. Also, numerically stable algorithms are utilized wherever needed to maintain precision.

Additionally, some numeric Aggregators use other Aggregator results to calculate their own results. Normally, the result of a numeric Aggregator is a double, as a result of calling terminate. For some Aggregators to use other Aggregators in their calculations, they call terminateDoubleDouble, an internally used method that provides intermediate results at DoubleDouble precision.

Operations

The DoubleDouble class supplies many operations that represent mathematical operations on the represented value. Such operations can typically, but not always, handle both a double and another DoubleDouble. DoubleDoubles are not immutable; in fact, operations do not return separate DoubleDoubles -- they modify their own object. However, two constants are defined in the class which are immutable, NaN and ZERO. (Any operations that would modify those constants' contents throw UnsupportedOperationExceptions.)

Computational algorithms for DoubleDouble precision are based on "Algorithms for Quad-Double Precision Floating Point Arithmetic" by Hida, Li, and Bailey, 2000, Berkeley.

Here are the methods defined on the DoubleDouble class:

DoubleDouble() - Constructor with an initial value of zero.
DoubleDouble(double d) - Constructor that takes a normal double.
DoubleDouble(double hi, double lo) - Constructor that takes two doubles. If the two values "overlap" in precision, then they are normalized, so that the "low" value is no longer overlapping bits of precision with the "high" value, e.g. (1, 0.125) => (1.125, 0).
DoubleDouble(DoubleDouble dd) - A copy constructor.
reset() - Sets this DoubleDouble equal to zero.
doubleValue() - Returns the "high" double, which by itself is the best double approximation to this DoubleDouble's value.
getLow() - Returns the "low" double, which is the "correction" applied to the "high" double to arrive at the overall more precise DoubleDouble value.
isNaN() - Returns whether this DoubleDouble represents NaN - not a number.
addToSelf(DoubleDouble dd) - Add another DoubleDouble to this one, modifying this value.
addToSelf(double d) - Add a double to this value, modifying this value.
subtractFromSelf(DoubleDouble dd) - Subtract another DoubleDouble from this one, modifying this value.
subtractFromSelf(double d) - Subtract a double from this value, modifying this value.
negateSelf() - Negates this value. Both the "high" and "low" values are negated.
multiplySelfBy(DoubleDouble dd) - Multiply this value by another DoubleDouble, modifying this value.
multiplySelfBy(double d) - Multiply this value by a double, modifying this value.
squareSelf() - Multiply this value by itself, modifying this value.
divideSelfBy(DoubleDouble dd) - Divide this value by another DoubleDouble, modifying this value.
divideSelfBy(double d) - Divide this value by a double, modifying this value.
sqrtSelf() - Take the square root of this value, modifying this value.
powSelf(long exponent) - Raise this value to an integer power, modifying this value.
nthRootSelf(long n) - Take the nth root of this value, modifying this value.
compareTo(DoubleDouble other) - Compare to another DoubleDouble, to determine less than, equal to, or greater than. (DoubleDoubles are Comparable to each other.)

Overview

Quick User Guide

API

Aggregators

Super Aggregation

Analytics

Double-Double Precision

Multiset Discrimination

Project Documentation

Floating Point Calculations

DoubleDouble Solution

Operations