When aggregating by a list of "group-by" properties, the developer has the option of specifying Multiset Discrimination instead of sorting, in order to gather together all items whose properties compare equal to each other prior to aggregation. Normally, sorting is used to achieve this aim. Java's built-in method "Collections.sort" is used to sort the input before aggregation. It is a merge sort, which means it runs in O(n log n) time. For a lot of purposes, this is fine, and additionally, it has the added benefit that the returned AggregateValues are placed in the returned List in sorted order. For those applications that need to aggregate a lot of data, and the returned order does not matter, Multiset Discrimination is the answer. While multiset discrimination may return results that are "out of order", it is able to place items whose properties compare equal adjacent to each other, just like sorting can. However, unlike sorting, multiset discrimination runs in O(n) time in the worst case.
All data can be eventually broken down to the bits and bytes of the data to analyze. At the base case, jAgg discriminates 16-bit values. It allocates an array "lists" of 216 Lists of items, and an array "used" of 216 integers to represent used indices in the array of Lists. Then it follows this algorithm:
A constant level of operations is performed for each element in the input List, so the running time of this algorithm is O(n).
Object types that implement the interface Discriminator are capable of multiset discrimination on a particular type.
Most object types can have more than 216 possible values. So jAgg performs multiset discrimination on those types hierarchically. It can't do its job on the entire value, so it does its job repeatedly, on labels. A label is a portion of a value that is used as the actual value during multiset discrimination. It performs multiset discrimination on the first label, or 16 bytes, of the value that it can, creating a set of equivalence classes. Then, on each equivalence class, it performs multiset discrimination on the next label, or 16 bytes, creating another set of equivalence classes. This process is repeated until the entire value is used up.
Objects that implement the Extractor interface are capable of taking an element, and extracting a label from that element for this purpose.
For primitive values such as bytes, chars, shorts, and booleans, only one pass is necessary. However, for other primitive types, such as integers, longs, floats, and doubles, multiple passes are necessary.
For more complex data types such as arrays, Strings, Collections, Dates, and Calendars, each element is considered, one at a time, until all elements are discriminated.
For this to work with a List of "group by" properties in jAgg, a PropertiesDiscriminator is employed that will perform multiset discrimination only on the properties specified.
Sometimes, null values are encountered when performing multiset discrimination. A NullDiscriminator is employed to separate all null values into their own equivalence class before passing on discrimination duties to another Discriminator.