kwarray.util_groups

Functions for partitioning numpy arrays into groups.

Module Contents

Functions

group_items(item_list, groupid_list, assume_sorted=False, axis=None)

Groups a list of items by group id.

group_indices(idx_to_groupid, assume_sorted=False)

Find unique items and the indices at which they appear in an array.

apply_grouping(items, groupxs, axis=0)

Applies grouping from group_indicies.

group_consecutive(arr, offset=1)

Returns lists of consecutive values. Implementation inspired by 3.

group_consecutive_indices(arr, offset=1)

Returns lists of indices pointing to consecutive values

kwarray.util_groups.group_items(item_list, groupid_list, assume_sorted=False, axis=None)

Groups a list of items by group id.

Works like ubelt.group_items(), but with numpy optimizations. This can be quite a bit faster than using itertools.groupby() 1 2.

In cases where there are many lists of items to group (think column-major data), consider using group_indices() and apply_grouping() instead.

Parameters
  • item_list (ndarray[T1]) – The input array of items to group.

  • groupid_list (ndarray[T2]) – Each item is an id corresponding to the item at the same position in item_list. For the fastest runtime, the input array must be numeric (ideally with integer types). This list must be 1-dimensional.

  • assume_sorted (bool, default=False) – If the input array is sorted, then setting this to True will avoid an unnecessary sorting operation and improve efficiency.

  • axis (int | None) – group along a particular axis in items if it is n-dimensional

Returns

mapping from groupids to corresponding items

Return type

Dict[T2, ndarray[T1]]

References

1

http://stackoverflow.com/questions/4651683/

2

numpy-grouping-using-itertools-groupby-performance

Example

>>> from kwarray.util_groups import *  # NOQA
>>> items = np.array([0, 1, 2, 3, 4, 5, 6, 7])
>>> keys = np.array( [2, 2, 1, 1, 0, 1, 0, 1])
>>> grouped = group_items(items, keys)
>>> print(ub.repr2(grouped, nl=1, with_dtype=False))
{
    0: np.array([4, 6]),
    1: np.array([2, 3, 5, 7]),
    2: np.array([0, 1]),
}
kwarray.util_groups.group_indices(idx_to_groupid, assume_sorted=False)

Find unique items and the indices at which they appear in an array.

A common use case of this function is when you have a list of objects (often numeric but sometimes not) and an array of “group-ids” corresponding to that list of objects.

Using this function will return a list of indices that can be used in conjunction with apply_grouping() to group the elements. This is most useful when you have many lists (think column-major data) corresponding to the group-ids.

In cases where there is only one list of objects or knowing the indices doesn’t matter, then consider using func:group_items instead.

Parameters
  • idx_to_groupid (ndarray) – The input array, where each item is interpreted as a group id. For the fastest runtime, the input array must be numeric (ideally with integer types). If the type is non-numeric then the less efficient ubelt.group_items() is used.

  • assume_sorted (bool, default=False) – If the input array is sorted, then setting this to True will avoid an unnecessary sorting operation and improve efficiency.

Returns

(keys, groupxs) -
keys (ndarray):

The unique elements of the input array in order

groupxs (List[ndarray]):

Corresponding list of indexes. The i-th item is an array indicating the indices where the item key[i] appeared in the input array.

Return type

Tuple[ndarray, List[ndarrays]]

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> import ubelt as ub
>>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3])
>>> (keys, groupxs) = group_indices(idx_to_groupid)
>>> print(ub.repr2(keys, with_dtype=False))
>>> print(ub.repr2(groupxs, with_dtype=False))
np.array([1, 2, 3])
[
    np.array([1, 3, 5]),
    np.array([0, 2, 4, 6]),
    np.array([ 7,  8,  9, 10]),
]

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> import ubelt as ub
>>> idx_to_groupid = np.array([[  24], [ 129], [ 659], [ 659], [ 24],
...       [659], [ 659], [ 822], [ 659], [ 659], [24]])
>>> # 2d arrays must be flattened before coming into this function so
>>> # information is on the last axis
>>> (keys, groupxs) = group_indices(idx_to_groupid.T[0])
>>> print(ub.repr2(keys, with_dtype=False))
>>> print(ub.repr2(groupxs, with_dtype=False))
np.array([ 24, 129, 659, 822])
[
    np.array([ 0,  4, 10]),
    np.array([1]),
    np.array([2, 3, 5, 6, 8, 9]),
    np.array([7]),
]

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> import ubelt as ub
>>> idx_to_groupid = np.array([True, True, False, True, False, False, True])
>>> (keys, groupxs) = group_indices(idx_to_groupid)
>>> print(ub.repr2(keys, with_dtype=False))
>>> print(ub.repr2(groupxs, with_dtype=False))
np.array([False,  True])
[
    np.array([2, 4, 5]),
    np.array([0, 1, 3, 6]),
]

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> import ubelt as ub
>>> idx_to_groupid = [('a', 'b'),  ('d', 'b'), ('a', 'b'), ('a', 'b')]
>>> (keys, groupxs) = group_indices(idx_to_groupid)
>>> print(ub.repr2(keys, with_dtype=False))
>>> print(ub.repr2(groupxs, with_dtype=False))
[
    ('a', 'b'),
    ('d', 'b'),
]
[
    np.array([0, 2, 3]),
    np.array([1]),
]
kwarray.util_groups.apply_grouping(items, groupxs, axis=0)

Applies grouping from group_indicies.

Typically used in conjunction with group_indices().

Parameters
  • items (ndarray) – items to group

  • groupxs (List[ndarrays[int]]) – groups of indices

  • axis (None|int, default=0)

Returns

grouped items

Return type

List[ndarray]

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3])
>>> items          = np.array([1, 8, 5, 5, 8, 6, 7, 5, 3, 0, 9])
>>> (keys, groupxs) = group_indices(idx_to_groupid)
>>> grouped_items = apply_grouping(items, groupxs)
>>> result = str(grouped_items)
>>> print(result)
[array([8, 5, 6]), array([1, 5, 8, 7]), array([5, 3, 0, 9])]
kwarray.util_groups.group_consecutive(arr, offset=1)

Returns lists of consecutive values. Implementation inspired by 3.

Parameters
  • arr (ndarray) – array of ordered values

  • offset (float, default=1) – any two values separated by this offset are grouped. In the default case, when offset=1, this groups increasing values like: 0, 1, 2. When offset is 0 it groups consecutive values thta are the same, e.g.: 4, 4, 4.

Returns

a list of arrays that are the groups from the input

Return type

List[ndarray]

Notes

This is equivalent (and faster) to using: apply_grouping(data, group_consecutive_indices(data))

References

3(1,2)

http://stackoverflow.com/questions/7352684/groups-consecutive-elements

Example

>>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101])
>>> groups = group_consecutive(arr)
>>> print('groups = {}'.format(list(map(list, groups))))
groups = [[1, 2, 3], [5, 6, 7, 8, 9, 10], [15], [99, 100, 101]]
>>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1])
>>> groups = group_consecutive(arr, offset=1)
>>> print('groups = {}'.format(list(map(list, groups))))
groups = [[0], [0], [3], [0], [0], [7], [2, 3, 4], [4], [4], [1], [1]]
>>> groups = group_consecutive(arr, offset=0)
>>> print('groups = {}'.format(list(map(list, groups))))
groups = [[0, 0], [3], [0, 0], [7], [2], [3], [4, 4, 4], [1, 1]]
kwarray.util_groups.group_consecutive_indices(arr, offset=1)

Returns lists of indices pointing to consecutive values

Parameters
  • arr (ndarray) – array of ordered values

  • offset (float, default=1) – any two values separated by this offset are grouped.

Returns

groupxs: a list of indices

Return type

List[ndarray]

SeeAlso:

Example

>>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101])
>>> groupxs = group_consecutive_indices(arr)
>>> print('groupxs = {}'.format(list(map(list, groupxs))))
groupxs = [[0, 1, 2], [3, 4, 5, 6, 7, 8], [9], [10, 11, 12]]
>>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs)))
>>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1])
>>> groupxs = group_consecutive_indices(arr, offset=1)
>>> print('groupxs = {}'.format(list(map(list, groupxs))))
groupxs = [[0], [1], [2], [3], [4], [5], [6, 7, 8], [9], [10], [11], [12]]
>>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs)))
>>> groupxs = group_consecutive_indices(arr, offset=0)
>>> print('groupxs = {}'.format(list(map(list, groupxs))))
groupxs = [[0, 1], [2], [3, 4], [5], [6], [7], [8, 9, 10], [11, 12]]
>>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 0), apply_grouping(arr, groupxs)))