kwarray.util_groups
¶
Functions for partitioning numpy arrays into groups.
Module Contents¶
Functions¶
|
Groups a list of items by group id. |
|
Find unique items and the indices at which they appear in an array. |
|
Applies grouping from group_indicies. |
|
Returns lists of consecutive values. Implementation inspired by 3. |
|
Returns lists of indices pointing to consecutive values |
- kwarray.util_groups.group_items(item_list, groupid_list, assume_sorted=False, axis=None)¶
Groups a list of items by group id.
Works like
ubelt.group_items()
, but with numpy optimizations. This can be quite a bit faster than usingitertools.groupby()
1 2.In cases where there are many lists of items to group (think column-major data), consider using
group_indices()
andapply_grouping()
instead.- Parameters
item_list (ndarray[T1]) – The input array of items to group.
groupid_list (ndarray[T2]) – Each item is an id corresponding to the item at the same position in
item_list
. For the fastest runtime, the input array must be numeric (ideally with integer types). This list must be 1-dimensional.assume_sorted (bool, default=False) – If the input array is sorted, then setting this to True will avoid an unnecessary sorting operation and improve efficiency.
axis (int | None) – group along a particular axis in
items
if it is n-dimensional
- Returns
mapping from groupids to corresponding items
- Return type
Dict[T2, ndarray[T1]]
References
Example
>>> from kwarray.util_groups import * # NOQA >>> items = np.array([0, 1, 2, 3, 4, 5, 6, 7]) >>> keys = np.array( [2, 2, 1, 1, 0, 1, 0, 1]) >>> grouped = group_items(items, keys) >>> print(ub.repr2(grouped, nl=1, with_dtype=False)) { 0: np.array([4, 6]), 1: np.array([2, 3, 5, 7]), 2: np.array([0, 1]), }
- kwarray.util_groups.group_indices(idx_to_groupid, assume_sorted=False)¶
Find unique items and the indices at which they appear in an array.
A common use case of this function is when you have a list of objects (often numeric but sometimes not) and an array of “group-ids” corresponding to that list of objects.
Using this function will return a list of indices that can be used in conjunction with
apply_grouping()
to group the elements. This is most useful when you have many lists (think column-major data) corresponding to the group-ids.In cases where there is only one list of objects or knowing the indices doesn’t matter, then consider using func:group_items instead.
- Parameters
idx_to_groupid (ndarray) – The input array, where each item is interpreted as a group id. For the fastest runtime, the input array must be numeric (ideally with integer types). If the type is non-numeric then the less efficient
ubelt.group_items()
is used.assume_sorted (bool, default=False) – If the input array is sorted, then setting this to True will avoid an unnecessary sorting operation and improve efficiency.
- Returns
- (keys, groupxs) -
- keys (ndarray):
The unique elements of the input array in order
- groupxs (List[ndarray]):
Corresponding list of indexes. The i-th item is an array indicating the indices where the item
key[i]
appeared in the input array.
- Return type
Tuple[ndarray, List[ndarrays]]
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3]) >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) np.array([1, 2, 3]) [ np.array([1, 3, 5]), np.array([0, 2, 4, 6]), np.array([ 7, 8, 9, 10]), ]
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = np.array([[ 24], [ 129], [ 659], [ 659], [ 24], ... [659], [ 659], [ 822], [ 659], [ 659], [24]]) >>> # 2d arrays must be flattened before coming into this function so >>> # information is on the last axis >>> (keys, groupxs) = group_indices(idx_to_groupid.T[0]) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) np.array([ 24, 129, 659, 822]) [ np.array([ 0, 4, 10]), np.array([1]), np.array([2, 3, 5, 6, 8, 9]), np.array([7]), ]
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = np.array([True, True, False, True, False, False, True]) >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) np.array([False, True]) [ np.array([2, 4, 5]), np.array([0, 1, 3, 6]), ]
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = [('a', 'b'), ('d', 'b'), ('a', 'b'), ('a', 'b')] >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) [ ('a', 'b'), ('d', 'b'), ] [ np.array([0, 2, 3]), np.array([1]), ]
- kwarray.util_groups.apply_grouping(items, groupxs, axis=0)¶
Applies grouping from group_indicies.
Typically used in conjunction with
group_indices()
.- Parameters
items (ndarray) – items to group
groupxs (List[ndarrays[int]]) – groups of indices
axis (None|int, default=0)
- Returns
grouped items
- Return type
List[ndarray]
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3]) >>> items = np.array([1, 8, 5, 5, 8, 6, 7, 5, 3, 0, 9]) >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> grouped_items = apply_grouping(items, groupxs) >>> result = str(grouped_items) >>> print(result) [array([8, 5, 6]), array([1, 5, 8, 7]), array([5, 3, 0, 9])]
- kwarray.util_groups.group_consecutive(arr, offset=1)¶
Returns lists of consecutive values. Implementation inspired by 3.
- Parameters
arr (ndarray) – array of ordered values
offset (float, default=1) – any two values separated by this offset are grouped. In the default case, when offset=1, this groups increasing values like: 0, 1, 2. When offset is 0 it groups consecutive values thta are the same, e.g.: 4, 4, 4.
- Returns
a list of arrays that are the groups from the input
- Return type
List[ndarray]
Notes
This is equivalent (and faster) to using: apply_grouping(data, group_consecutive_indices(data))
References
Example
>>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101]) >>> groups = group_consecutive(arr) >>> print('groups = {}'.format(list(map(list, groups)))) groups = [[1, 2, 3], [5, 6, 7, 8, 9, 10], [15], [99, 100, 101]] >>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1]) >>> groups = group_consecutive(arr, offset=1) >>> print('groups = {}'.format(list(map(list, groups)))) groups = [[0], [0], [3], [0], [0], [7], [2, 3, 4], [4], [4], [1], [1]] >>> groups = group_consecutive(arr, offset=0) >>> print('groups = {}'.format(list(map(list, groups)))) groups = [[0, 0], [3], [0, 0], [7], [2], [3], [4, 4, 4], [1, 1]]
- kwarray.util_groups.group_consecutive_indices(arr, offset=1)¶
Returns lists of indices pointing to consecutive values
- Parameters
arr (ndarray) – array of ordered values
offset (float, default=1) – any two values separated by this offset are grouped.
- Returns
groupxs: a list of indices
- Return type
List[ndarray]
SeeAlso:
Example
>>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101]) >>> groupxs = group_consecutive_indices(arr) >>> print('groupxs = {}'.format(list(map(list, groupxs)))) groupxs = [[0, 1, 2], [3, 4, 5, 6, 7, 8], [9], [10, 11, 12]] >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs))) >>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1]) >>> groupxs = group_consecutive_indices(arr, offset=1) >>> print('groupxs = {}'.format(list(map(list, groupxs)))) groupxs = [[0], [1], [2], [3], [4], [5], [6, 7, 8], [9], [10], [11], [12]] >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs))) >>> groupxs = group_consecutive_indices(arr, offset=0) >>> print('groupxs = {}'.format(list(map(list, groupxs)))) groupxs = [[0, 1], [2], [3, 4], [5], [6], [7], [8, 9, 10], [11, 12]] >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 0), apply_grouping(arr, groupxs)))