News
jAgg 0.9.0 (Beta) Released
New in jAgg 0.9.0:
- Ticket #7. An aggregator specification string for ConcatAggregator
isn't working:
Aggregator agg = Aggregator.getAggregator("Concat(property, \",\")")
The one-arg constructor for ConcatAggregator has been fixed.
- Ticket #8. Create Analytic Functions. jAgg now supports analytic functions
that return a value for every row, depending on a dynamic window of values around
the current value. The AnalyticFunction interface has been created
to facilitate the creation of analytic functions. It extends the new
AggregateFunction interface, which has been pulled out of
Aggregator. Built-in analytic functions include CumeDist, DenseRank,
Lag, Lead, Ntile, PercentRank, Rank, RatioToReport, and RowNumber. All existing
built-in Aggregators can now be used as analytic functions too.
Internally, an AnalyticAggregator contains the
AnalyticFunction and has knowledge of the PartitionClause,
the OrderByClause, and the WindowClause. More details
can be found on the AnalyticFunctions page
and on the AnalyticAggregator page.
- Ticket #9. jAgg Exceptions. Prior to 0.9.0, jAgg threw Java built-in
exceptions such as IllegalArgumentException,
NoSuchMethodException, and UnsupportedOperationException.
Now, jAgg wraps these in its own defined exceptions, each of which is a
JaggException, which is a RuntimeException. If you are
catching exceptions coming from jAgg, please consider catching a
JaggException, which may be a subclass:
AggregatorCreationException, AnalyticCreationException,
ExpectedComparableException, ExpectedNumberException,
ParseException, or PropertyAccessException.
- Additionally, the package hierarchy is a bit more organized, with the addition
of the exception, math, model, and
util packages under net.sf.jagg.
View a history of all changes at the Change Log.
Overview
jAgg is a Java 5.0 API that supports “group by” operations on Lists of Java objects:
aggregate operations such as count, sum, max, min, avg, and many more. It allows such
"super aggregate" operations as rollups and cubes. It also allows custom aggregate
operations. That is, one can create custom Aggregators
to work with jAgg. jAgg supports analytic
functions, which calculate values for every row based on a window of values around
the current row. One can create custom analytic functions
to work with jAgg.
Introduction
Java
Today in Java there is no practical “group by” operation that imitates the corresponding
database functionality mandated by the SQL language. That is, we can’t take an arbitrary
List of Objects, group them according to specific object properties, and perform aggregate
operations on them. There are a few parts of Java that do begin to implement a little of
the desired functionality. Some of them follow here:
- The methods “Collections.min” and “Collections.max” iterate over a Collection, returning
the minimum or maximum, respectively, of the Collection. The objects in the Collection
must be Comparable.
- The method “Collections.sort” does provide a sorting capability, which is necessary
before object values can be aggregated.
- The “Collection” class does define a “size” method that returns the count of items in a
Collection.
- Java Specification Request (JSR) 243 is Java Data Objects (JDO) 2.0. (JDO 1.0 was
specified by JSR 12). Its main focus is object persistence and object-relational
mapping. In the specification for JDO 2.2, chapter 14, a PersistenceManager offers the
Query capability to applications. This supplies built-in SQL-like syntax that offers
limited aggregate function capabilities, with average, count, max, min, and sum supported.
There is no known library that performs analytic functions in Java.
A programmer can always write specific code that loops over a List of Objects, extracts
the desired values, performs the aggregate calculations, and returns the aggregate result.
But such code is very likely to be highly coupled to existing programmer object types.
Oracle
Oracle, being a relational database that supports the SQL standard, supports many
aggregate functions, including many that go beyond the five basic aggregate operations
mentioned above, like variance, covariance, standard deviation, correlation, linear
regression, and percentile.
Oracle also allows the database user to implement custom aggregate functions, covered
here.
If a database programmer creates an Oracle object type with a few specific method names,
and associates this object type with the definition of a new function, then a new
aggregate function is created. The object type must define methods for initialization,
value iteration (processing the next row of input), merging (merging object state for
parallel processing), and termination (calculation of the final result). One can
even add an additional method for deletion (removal of a value from consideration),
which supports the aggregate function's use as an analytic function.
Main Features
- Ability to apply "group by" functionality to an arbitrary List of Objects
- Specify "group by" properties by making List items Comparable or by supplying
a list of property names, like a SQL "group by"
- 20 built-in aggregators, including Sum, Count, Avg, Max, and Min
- Support for custom aggregators
- Parallel processing
- Multiset Discrimination as an alternative to sorting, to gather items with
identical properties prior to aggregation
- Super aggregate feature includes ability to create rollups and cubes
- Ability to apply analytic functions to an arbitrary List of Objects
- Each built-in aggregator can be used as an analytic function
- 9 built-in analytic-only functions, in addition to the built-in aggregators
that can be used as analytic functions
- Support for custom analytics
Use Case
What if a Java programmer obtains a List of Objects, from a database or another data
source, but wants to provide multiple or customizable views to summarize and/or breakdown
the data? The programmer does not want to go back to the database or data source for each
breakdown a user specifies. Such queries can be costly.
A mechanism to obtain the data once, and then process aggregate functions in any manner
in memory is more desirable in this case.
Fully Dressed Use Case
Primary Actor: Statistical Analyzer
Stakeholders and Interests:
- Statistical Analyzer: Wants aggregate operations performed on a list of values,
without going to a database, or going back to a database from which the list of
values came.
Preconditions: A statistical analyzer has a List of values to analyze with one or more
aggregate operations. Built-in operations include, but are not limited to, standard
aggregate operations such as average, count, max, min, and sum.
Success guarantee: The aggregation engine generates correct values for each desired
aggregate operation, or it throws a RuntimeException that indicates why an operation
could not be performed.
Main Success Scenario:
- A statistical analyzer has obtained a list of objects that contain one or more
sets of values, and knows which aggregate operations are to be performed.
- The analyzer sends the list of objects, a list of "group-by" properties, and a
list of aggregate operations on specific properties to the jAgg API.
- The jAgg API creates a shallow copy of the list of objects and sorts the list.
- The API iterates through the sorted list, creating and using Aggregators to
obtain aggregate values.
- The API wraps the "group-by" values and the aggregate values in a List of
AggregateValues and returns that list to the analyzer.
Alternative Flows:
4a. A specified "group-by" or aggregation property name is invalid, a specified
property is inappropriate for a performed aggregation, or an Exception is generated
by an Aggregator. No non-null values does not represent an alternative flow.
- The aggregator engine throws a RuntimeException to indicate that an error
has occurred during the aggregation process, encapsulating any internally
thrown Exception as its cause.
Technology and Data Variations List:
2a. The statistical analyzer may indicate that the aggregations engine should use one
or more custom Aggregator objects to generate custom aggregate values.