Custom Aggregators

The developer can create new aggregate functions that implement AggregateFunction and implements its methods. Typically, this is done by subclassing Aggregator and implementing its abstract methods. The methods to implement are as follows:

  1. public AggregateFunction replicate() – Create an uninitialized copy of the AggregateFunction that processes the same property, for the purposes of parallel processing.
  2. public void init() – Initialize the aggregate function’s state here. E.g., an average function initializes a sum variable and a count variable to zero. Aggregate functions may be reused, so they may be able to simply clear their internal state without instantiating new state objects.
  3. public void iterate(Object value) – Process the desired value into the aggregation calculation. E.g., an average function will add the object value’s property to the sum, and also increment its count. Use Aggregator’s static “getValueFromProperty” method to access a MethodCache, which caches Methods. This method allows the developer to access the proper property or properties.
  4. public void merge(AggregateFunction agg) – During parallel processing, multiple AggregateFunction objects may be processing different sections of the original List of Object values. When an Aggregation object calls this method, it is attempting to combine the states of two AggregateFunction objects. That is, the internal state of this AggregateFunction needs to reflect the internal state of the given AggregateFunction as well. E.g., an AvgAggregator would verify that the given AggregateFunction is also an AvgAggregator, then add its sum to its own sum and add its count to its own count.
  5. public Object terminate() – At this point, an entire set of values whose “group by” properties compare equal has been processed. Calculate the aggregate result and return it. E.g., an average function divides the sum by the count and returns the average (or Double.NaN if the count is zero).
  6. public DoubleDouble terminateDoubleDouble() (Optional) - The AggregateFunction interface doesn't declare this method, but the abstract subclass Aggregator does. If the custom Aggregator uses DoubleDoubles internally, then it should override this method so that the higher precision result could be used internally by other Aggregators. Typically, if this method is overridden, then terminate is implemented by calling this method and returning the DoubleDouble result as a Double.

Subclasses of Aggregator implement the abstract methods "init", "iterate", "merge", and "terminate" that do the actual aggregation. They also implement the abstract method "replicate" to return a new Aggregator of the same type, but un-initialized. They optionally override the method "terminateDoubleDouble", but only if they internally use DoubleDoubles.

The custom aggregate function developer may wish to look at the source code for built-in aggregate functions to get a better feel of what each of the above methods accomplishes.

When creating an aggregator specification string for a custom AggregateFunction, use the fully-qualified class name minus the "Aggregator" suffix, e.g. "org.groupname.myproject.aggs.Custom(property1)".

A custom AggregateFunction should also define a one-argument String constructor to store the property(ies) of the AggregateFunction, e.g. "public CustomAggregator(String property)".

Multiple Properties

If the desired custom AggregateFunction needs to process multiple properties, then it should subclass TwoPropAggregator or MultiPropAggregator. Those abstract classes subclass Aggregator and provide the functionality to support two or more property names.