Grouping Sets

One of the ways that jAgg supports super aggregation is by supporting grouping sets. Normally, jAgg calculates aggregate values based on all "group by" properties specified in the Builder object. However, sometimes different levels of aggregation are desired. For example, jAgg supports rollups and cubes. However, for more control over the exact grouping sets desired, one can explicitly state the grouping sets.

Once can simply call jAgg again, with less properties, to obtain the desired slices, but that would mean a separate pass through the data for each level of super aggregation. With grouping sets, and super aggregation in general, jAgg reuses the Aggregators that were used to calculate the original, normal aggregate values, in order to calculate the new super aggregate values.

Understanding Grouping Sets

When a set of grouping sets is specified, jAgg needs to know which properties by which to create all different slices. jAgg expects 0-based indices to specify which properties. These indices refer to the original list of property names that was supplied to the Builder object. jAgg will create subtotals for every specified grouping set.

Cubes and rollups are special cases of specifying the grouping sets. If there are four properties, then:

  • rollup({1, 2, 3}) = groupingSets({0, 1, 2, 3}, {0, 1, 2}, {0, 1}, {0})
  • rollup({2}) = groupingSets({0, 1, 2, 3}, {0, 1, 3})
  • rollup({0, 1, 2, 3}) = groupingSets({0, 1, 2, 3}, {0, 1, 2}, {0, 1}, {0}, {})
  • cube({0, 1, 2}) = groupingSets({0, 1, 2, 3}, {0, 1, 3}, {0, 2, 3}, {1, 2, 3}, {0, 3}, {1, 3}, {2, 3}, {3})

In this example, four property names were originally specified to jAgg. Without grouping sets, normal aggregation proceeds and here are the results:

List<String> properties = Arrays.asList("property1", "property2", "property3", "property4");
List<Aggregator> aggs = Arrays.asList(Aggregator.getAggregator("Sum(value)"));
Aggregation agg = new Aggregation.Builder()
   .setProperties(properties)
   .setAggregators(aggs)
   .build();
List<AggregateValue<TestRec>> aggValues = agg.groupBy(testRecords);
            
property1 property2 property3 property4 Sum(value)
A 1 red true 2
A 1 red false 3
A 1 green true 6
A 1 green false 10
A 2 red true 102
A 2 red false 103
A 2 green true 106
A 2 green false 110
B 1 red true 1002
B 1 red false 1003
B 1 green true 1006
B 1 green false 1010
B 2 red true 1102
B 2 red false 1103
B 2 green true 1106
B 2 green false 1110

Here are the new results when the following grouping sets are specified: {2, 3} and {0, 1}. Notice that the original set of "normal" aggregate values (grouping set {0, 1, 2, 3}) is no longer returned.

List<String> properties = Arrays.asList("property1", "property2", "property3", "property4");
List<Aggregator> aggs = Arrays.asList(Aggregator.getAggregator("Sum(value)"));
List<List<Integer>> groupingSets = Arrays.asList(
   Arrays.asList(2, 3),
   Arrays.asList(0, 1));
Aggregation agg = new Aggregation.Builder()
   .setProperties(properties)
   .setAggregators(aggs)
   .setGroupingSets(groupingSets)
   .build();
List<AggregateValue<TestRec>> aggValues = agg.groupBy(testRecords);
            
property1 property2 property3 property4 Sum(value)
    red true 2208
    red false 2212
    green true 2224
    green false 2240
A 1     21
A 2     421
B 1     4021
B 2     4421

Only those grouping sets specified are returned.

It is possible to specify multiple grouping sets at once, as long as there are no duplicate grouping sets.

Identifying Grouping Sets

If a certain property represents "all values", then the result from getPropertyValue for that property will be null. But what if null is the actual value being aggregated? jAgg tells these cases apart with the help of the methods isGrouping() and getGroupingId.

  • isGrouping(int field) - Determines whether the property referenced by the given 0-based index represents "all values". If true, then getPropertyValue(field) returns null and this is a super aggregate value. If false, then getPropertyValue(field) can return any value, including null, and this aggregate value does not represent "all values" for this property.
  • isGrouping(String propertyName) - Determines whether the given property represents "all values". If true, then getPropertyValue(field) returns null and this is a super aggregate value. If false, then getPropertyValue(field) can return any value, including null, and this aggregate value does not represent "all values" for this property.
  • getGroupingId(List<?> fields) - Creates a distinct integer grouping set ID based on the referenced fields, which may be 0-based integer references or property name strings, or both. Every aggregate value that has the same properties representing "all values" has the same integer ID.

Here is the same example as above, but with the above method call results included.

property1 property2 property3 property4 Sum(value) isGrouping(0) isGrouping(1) isGrouping(2) isGrouping(3) getGroupingId({0, 1}) getGroupingId({0, 1, 2, 3})
    red true 2208 true true false false 3 12
    red false 2212 true true false false 3 12
    green true 2224 true true false false 3 12
    green false 2240 true true false false 3 12
A 1     21 false false true true 0 3
A 2     421 false false true true 0 3
B 1     4021 false false true true 0 3
B 2     4421 false false true true 0 3