Custom Analytics

The developer can create new analytic functions that subclass Aggregator and implement its abstract methods. The methods to implement are as follows:

  1. public AggregateFunction replicate() – Create an uninitialized copy of the AnalyticFunction that processes the same property, for the purposes of parallel processing.
  2. public void init() – Initialize the analytic’s state here. E.g., an average analytic initializes a sum variable and a count variable to zero. Analytics may be reused, so they may be able to simply clear their internal state without instantiating new state objects.
  3. public void iterate(Object value) – Process the desired value into the analytic calculation, sliding the end of the window forward. E.g., an average analytic will add the object value’s property to the sum, and also increment its count. Use Aggregator’s static “getValueFromProperty” method to access a MethodCache, which caches Methods. This method allows the developer to access the proper property or properties.
  4. public void merge(AggregateFunction agg) – During parallel processing, multiple AnalyticFunction objects may be processing different sections of the original List of Object values. When an Analytic object calls this method, it is attempting to combine the states of two AnalyticFunction objects. That is, the internal state of this AnalyticFunction needs to reflect the internal state of the given AnalyticFunction as well. E.g., an AvgAggregator would verify that the given AggregateFunction is also an AvgAggregator, then add its sum to its own sum and add its count to its own count.
  5. public void delete(Object value) - Remove the desired value from the analytic calculation. When the beginning of the window slides forward, this method is called to remove the value from the analytic calculation. E.g., an average analytic will subtract the object value’s property from the sum, and also decrement its count. Use Aggregator’s static “getValueFromProperty” method to access a MethodCache, which caches Methods. This method allows the developer to access the proper property or properties.
  6. public Object terminate() – When the window matches up with a row's desired window, according to the window clause, this method is called. Calculate the analytic result and return it. E.g., an average analytic divides the sum by the count and returns the average (or Double.NaN if the count is zero).
  7. public DoubleDouble terminateDoubleDouble() (Optional) - If the custom analytic uses DoubleDoubles internally, then it should override this method so that the higher precision result could be used internally by other analytics. Typically, if this method is overridden, then terminate is implemented by calling this method and returning the DoubleDouble result as a Double.
  8. public boolean takesWindowClause() - If the custom analytic function can handle any user-given window clause, then return true. Else, if it must use its own window clause, rejecting any user-given window clause, then return false. E.g., an average analytic can handle any user-given window clause, so it returns true.
  9. public WindowClause getWindowClause() - If the custom analytic function doesn't handle user-given window clauses (takesWindowClause() returns false), then this method returns the WindowClause to be used in the analytic calculation. E.g. an average analytic doesn't supply its own window clause, so it returns null.

Implementations of AnalyticFunction implement the abstract methods "init", "iterate", "merge", "delete", and "terminate" that do the actual analytics calculation. They also implement the abstract method "replicate" to return a new AnalyticFunction of the same type, but un-initialized. They optionally override the method "terminateDoubleDouble", but only if they internally use DoubleDoubles.

The custom aggregator developer may wish to look at the source code for built-in analytics to get a better feel of what each of the above methods accomplishes.

When creating an analytic specification string for a custom AnalyticFunction, use the fully-qualified class name minus the "Aggregator", "AnalyticAggregator", or "Analytic" suffix, e.g. "org.groupname.myproject.aggs.Custom(property1)". Any partition clause, order by clause, or window clause can be appended.

A custom AnalyticFunction should also define a one-argument String constructor to store the property(ies) of the AnalyticFunction, e.g. "public CustomAnalytic(String property)".

Multiple Properties

If the desired custom analytic function needs to process multiple properties, then it should subclass TwoPropAggregator or MultiPropAggregator. Those abstract classes subclass Aggregator and provide the functionality to support two or more property names.

Reusing Existing Classes

The developer can change an existing custom Aggregator to become an AnalyticFunction by implementing AnalyticFunction and adding a "delete" method. Most built-in Aggregators implement AnalyticFunction and can be used as-is as analytic functions.

The developer may need to implement additional behind-the-scenes functionality to properly implement AnalyticFunction. In that case, the developer may subclass an existing Aggregator, having that subclass implement AnalyticFunction. By convention, such subclasses have names that end with "AnalyticAggregator", e.g. MaxAnalyticAggregator and MinAnalyticAggregator. Specifically, max and min must keep track of all values in the window to be able to determine the actual max/min upon deletion of a value.

The developer may choose to implement AnalyticFunction directly, for analytic functions that don't make sense as aggregate functions. Some of jAgg's built-in analytics use this technique: DenseRank, Lag, Lead, Rank, and RowNumber.

Many of jAgg's built-in analytics depend on the results of other analytic functions that are processed with different window clauses. Different window clauses means that the values cannot be calculated at the same time. These analytic functions implement DependentAnalyticFunction, which allows an analytic value to be calculated based on the results of multiple analytic functions. The DependentAnalyticFunction interface extends AnalyticFunction and declares the following methods:

  • int getNumDependentFunctions() - Return the number of analytic functions on which this dependent analytic function depends.
  • AnalyticFunction getAnalyticFunction(int index) - Given an index, return a specific AnalyticFunction on which this dependent analytic function depends.
  • WindowClause getWindowClause(int index) - Given an index, return a specific WindowClause that corresponds to the same-indexed AnalyticFunction returned by getAnalyticFunction.
  • void setValue(int index, Object value) - Given an index, store the result of the same-indexed AnalyticFunction, to be used in a calculation by terminate. The Analytic class triggers the call to this method when it detects a DependentAnalyticFunction, before calling terminate.

Typically, a DependentAnalyticFunction implementation subclasses AbstractDependentAnalyticFunction, which supplies common functionality. The "init", "iterate", "merge", and "delete" methods do nothing; each AnalyticFunction on which an AbstractDependentAnalyticFunction depends is processed separately. The "takesWindowClause()" method is implemented as returning false, and "getWindowClause()" returns the window "range()", so that no values for the AbstractDependentAnalyticFunction will be determined until the entire partition has been processed. It implements "setValue" by storing the value in a Map. It defines the following method, so that concrete subclasses may access these values:

  • protected final Object getValue(int index)

This method is used when implementing terminate to access the values for the terminate calculation.

The rest of jAgg's built-in analytics subclass AbstractDependentAnalyticFunction: CumeDist, Ntile, PercentRank, and RatioToReport.