Custom Analytics
The developer can create new analytic functions that subclass Aggregator and implement its
abstract methods. The methods to implement are as follows:
- public AggregateFunction replicate() – Create an uninitialized copy of the AnalyticFunction
that processes the same property, for the purposes of parallel processing.
- public void init() – Initialize the analytic’s state here. E.g., an average
analytic initializes a sum variable and a count variable to zero. Analytics
may be reused, so they may be able to simply clear their internal state without
instantiating new state objects.
- public void iterate(Object value) – Process the desired value into the
analytic calculation, sliding the end of the window forward. E.g., an average
analytic will add the object value’s property to the sum, and also increment
its count. Use Aggregator’s static “getValueFromProperty” method to access a
MethodCache, which caches Methods. This method allows the developer to access
the proper property or properties.
- public void merge(AggregateFunction agg) – During parallel processing, multiple
AnalyticFunction objects may be processing different sections of the original List of
Object values. When an Analytic object calls this method, it is attempting
to combine the states of two AnalyticFunction objects. That is, the internal state
of this AnalyticFunction needs to reflect the internal state of the given AnalyticFunction
as well. E.g., an AvgAggregator would verify that the given AggregateFunction is also
an AvgAggregator, then add its sum to its own sum and add its count to its own
count.
- public void delete(Object value) - Remove the desired value from the analytic
calculation. When the beginning of the window slides forward, this method is
called to remove the value from the analytic calculation. E.g., an average
analytic will subtract the object value’s property from the sum, and also decrement
its count. Use Aggregator’s static “getValueFromProperty” method to access a
MethodCache, which caches Methods. This method allows the developer to access
the proper property or properties.
- public Object terminate() – When the window matches up with a row's desired
window, according to the window clause, this method is called. Calculate the
analytic result and return it. E.g., an average analytic divides the sum by
the count and returns the average (or Double.NaN if the count is zero).
- public DoubleDouble terminateDoubleDouble() (Optional) - If the custom analytic
uses DoubleDoubles internally, then it should override this method
so that the higher precision result could be used internally by other analytics.
Typically, if this method is overridden, then terminate is
implemented by calling this method and returning the DoubleDouble
result as a Double.
- public boolean takesWindowClause() - If the custom analytic function can handle
any user-given window clause, then return true. Else, if it must
use its own window clause, rejecting any user-given window clause, then return
false. E.g., an average analytic can handle any user-given window
clause, so it returns true.
- public WindowClause getWindowClause() - If the custom analytic function doesn't
handle user-given window clauses (takesWindowClause() returns
false), then this method returns the WindowClause to
be used in the analytic calculation. E.g. an average analytic doesn't supply
its own window clause, so it returns null.
Implementations of AnalyticFunction implement the abstract methods "init", "iterate",
"merge", "delete", and "terminate" that do the actual analytics calculation. They
also implement the abstract method "replicate" to return a new AnalyticFunction of
the same type, but un-initialized. They optionally override the method
"terminateDoubleDouble", but only if they internally use DoubleDoubles.
The custom aggregator developer may wish to look at the source code for built-in
analytics to get a better feel of what each of the above methods accomplishes.
When creating an analytic specification string for a custom AnalyticFunction, use the
fully-qualified class name minus the "Aggregator", "AnalyticAggregator", or
"Analytic" suffix, e.g. "org.groupname.myproject.aggs.Custom(property1)". Any
partition clause, order by clause, or window clause can be appended.
A custom AnalyticFunction should also define a one-argument String constructor to store
the property(ies) of the AnalyticFunction, e.g.
"public CustomAnalytic(String property)".
Multiple Properties
If the desired custom analytic function needs to process multiple properties, then it
should subclass TwoPropAggregator or MultiPropAggregator. Those abstract classes
subclass Aggregator and provide the functionality to support two or more property
names.
Reusing Existing Classes
The developer can change an existing custom Aggregator to become an
AnalyticFunction by implementing AnalyticFunction and adding
a "delete" method. Most built-in Aggregators
implement AnalyticFunction and can be used as-is as analytic functions.
The developer may need to implement additional behind-the-scenes functionality to
properly implement AnalyticFunction. In that case, the developer may
subclass an existing Aggregator, having that subclass implement
AnalyticFunction. By convention, such subclasses have names that end with
"AnalyticAggregator", e.g. MaxAnalyticAggregator
and MinAnalyticAggregator. Specifically, max and min
must keep track of all values in the window to be able to determine the actual max/min
upon deletion of a value.
The developer may choose to implement AnalyticFunction directly, for
analytic functions that don't make sense as aggregate functions. Some of jAgg's
built-in analytics use this technique:
DenseRank, Lag,
Lead, Rank, and
RowNumber.
Many of jAgg's built-in analytics depend on the results of other analytic functions
that are processed with different window clauses. Different window clauses means that
the values cannot be calculated at the same time. These analytic functions implement
DependentAnalyticFunction, which allows an analytic value to be calculated
based on the results of multiple analytic functions. The
DependentAnalyticFunction interface extends AnalyticFunction
and declares the following methods:
- int getNumDependentFunctions() - Return the number of analytic
functions on which this dependent analytic function depends.
- AnalyticFunction getAnalyticFunction(int index) - Given an index,
return a specific AnalyticFunction on which this dependent analytic
function depends.
- WindowClause getWindowClause(int index) - Given an index, return
a specific WindowClause that corresponds to the same-indexed
AnalyticFunction returned by getAnalyticFunction.
- void setValue(int index, Object value) - Given an index, store the
result of the same-indexed AnalyticFunction, to be used in a
calculation by terminate. The Analytic class triggers
the call to this method when it detects a DependentAnalyticFunction,
before calling terminate.
Typically, a DependentAnalyticFunction implementation subclasses
AbstractDependentAnalyticFunction, which supplies common functionality.
The "init", "iterate", "merge", and "delete" methods do nothing; each
AnalyticFunction on which an AbstractDependentAnalyticFunction
depends is processed separately. The "takesWindowClause()" method is implemented as
returning false, and "getWindowClause()" returns the window "range()", so
that no values for the AbstractDependentAnalyticFunction will be
determined until the entire partition has been processed. It implements "setValue" by
storing the value in a Map. It defines the following method, so that
concrete subclasses may access these values:
- protected final Object getValue(int index)
This method is used when implementing terminate to access the values
for the terminate calculation.
The rest of jAgg's built-in analytics subclass
AbstractDependentAnalyticFunction: CumeDist,
Ntile, PercentRank, and
RatioToReport.