Custom Aggregators
The developer can create new aggregate functions that implement
AggregateFunction and implements its
methods. Typically, this is done by subclassing
Aggregator and implementing its abstract methods.
The methods to implement are as follows:
- public AggregateFunction replicate() – Create an uninitialized copy of the
AggregateFunction that processes the same property, for the purposes of
parallel processing.
- public void init() – Initialize the aggregate function’s state here. E.g., an
average function initializes a sum variable and a count variable to zero.
Aggregate functions may be reused, so they may be able to simply clear their
internal state without instantiating new state objects.
- public void iterate(Object value) – Process the desired value into the
aggregation calculation. E.g., an average function will add the object value’s
property to the sum, and also increment its count. Use Aggregator’s static
“getValueFromProperty” method to access a MethodCache, which caches Methods. This
method allows the developer to access the proper property or properties.
- public void merge(AggregateFunction agg) – During parallel processing, multiple
AggregateFunction objects may be processing different sections of the original
List of Object values. When an Aggregation object calls this method, it is
attempting to combine the states of two AggregateFunction objects. That is,
the internal state of this AggregateFunction needs to reflect the internal
state of the given AggregateFunction as well. E.g., an AvgAggregator would
verify that the given AggregateFunction is also an AvgAggregator, then add its
sum to its own sum and add its count to its own count.
- public Object terminate() – At this point, an entire set of values whose “group
by” properties compare equal has been processed. Calculate the aggregate result
and return it. E.g., an average function divides the sum by the count and
returns the average (or Double.NaN if the count is zero).
- public DoubleDouble terminateDoubleDouble() (Optional) - The AggregateFunction
interface doesn't declare this method, but the abstract subclass Aggregator
does. If the custom Aggregator uses DoubleDoubles internally,
then it should override this method so that the higher precision result could
be used internally by other Aggregators. Typically, if this method is
overridden, then terminate is implemented by calling this method
and returning the DoubleDouble result as a Double.
Subclasses of Aggregator implement the abstract methods "init", "iterate", "merge", and
"terminate" that do the actual aggregation. They also implement the abstract method
"replicate" to return a new Aggregator of the same type, but un-initialized. They
optionally override the method "terminateDoubleDouble", but only if they internally
use DoubleDoubles.
The custom aggregate function developer may wish to look at the source code for
built-in aggregate functions to get a better feel of what each of the above methods
accomplishes.
When creating an aggregator specification string for a custom AggregateFunction, use
the fully-qualified class name minus the "Aggregator" suffix, e.g.
"org.groupname.myproject.aggs.Custom(property1)".
A custom AggregateFunction should also define a one-argument String constructor to
store the property(ies) of the AggregateFunction, e.g.
"public CustomAggregator(String property)".
Multiple Properties
If the desired custom AggregateFunction needs to process multiple properties, then
it should subclass TwoPropAggregator or MultiPropAggregator. Those abstract
classes subclass Aggregator and provide the functionality to support two or more
property names.