:py:mod:`kwarray.util_groups` ============================= .. py:module:: kwarray.util_groups .. autoapi-nested-parse:: Functions for partitioning numpy arrays into groups. Module Contents --------------- Functions ~~~~~~~~~ .. autoapisummary:: kwarray.util_groups.group_items kwarray.util_groups.group_indices kwarray.util_groups.apply_grouping kwarray.util_groups.group_consecutive kwarray.util_groups.group_consecutive_indices .. py:function:: group_items(item_list, groupid_list, assume_sorted=False, axis=None) Groups a list of items by group id. Works like :func:`ubelt.group_items`, but with numpy optimizations. This can be quite a bit faster than using :func:`itertools.groupby` [1]_ [2]_. In cases where there are many lists of items to group (think column-major data), consider using :func:`group_indices` and :func:`apply_grouping` instead. :Parameters: * **item_list** (*ndarray[T1]*) -- The input array of items to group. * **groupid_list** (*ndarray[T2]*) -- Each item is an id corresponding to the item at the same position in ``item_list``. For the fastest runtime, the input array must be numeric (ideally with integer types). This list must be 1-dimensional. * **assume_sorted** (*bool, default=False*) -- If the input array is sorted, then setting this to True will avoid an unnecessary sorting operation and improve efficiency. * **axis** (*int | None*) -- group along a particular axis in ``items`` if it is n-dimensional :returns: mapping from groupids to corresponding items :rtype: Dict[T2, ndarray[T1]] .. rubric:: References .. [1] http://stackoverflow.com/questions/4651683/ .. [2] numpy-grouping-using-itertools-groupby-performance .. rubric:: Example >>> from kwarray.util_groups import * # NOQA >>> items = np.array([0, 1, 2, 3, 4, 5, 6, 7]) >>> keys = np.array( [2, 2, 1, 1, 0, 1, 0, 1]) >>> grouped = group_items(items, keys) >>> print(ub.repr2(grouped, nl=1, with_dtype=False)) { 0: np.array([4, 6]), 1: np.array([2, 3, 5, 7]), 2: np.array([0, 1]), } .. py:function:: group_indices(idx_to_groupid, assume_sorted=False) Find unique items and the indices at which they appear in an array. A common use case of this function is when you have a list of objects (often numeric but sometimes not) and an array of "group-ids" corresponding to that list of objects. Using this function will return a list of indices that can be used in conjunction with :func:`apply_grouping` to group the elements. This is most useful when you have many lists (think column-major data) corresponding to the group-ids. In cases where there is only one list of objects or knowing the indices doesn't matter, then consider using func:`group_items` instead. :Parameters: * **idx_to_groupid** (*ndarray*) -- The input array, where each item is interpreted as a group id. For the fastest runtime, the input array must be numeric (ideally with integer types). If the type is non-numeric then the less efficient :func:`ubelt.group_items` is used. * **assume_sorted** (*bool, default=False*) -- If the input array is sorted, then setting this to True will avoid an unnecessary sorting operation and improve efficiency. :returns: (keys, groupxs) - keys (ndarray): The unique elements of the input array in order groupxs (List[ndarray]): Corresponding list of indexes. The i-th item is an array indicating the indices where the item ``key[i]`` appeared in the input array. :rtype: Tuple[ndarray, List[ndarrays]] .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3]) >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) np.array([1, 2, 3]) [ np.array([1, 3, 5]), np.array([0, 2, 4, 6]), np.array([ 7, 8, 9, 10]), ] .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = np.array([[ 24], [ 129], [ 659], [ 659], [ 24], ... [659], [ 659], [ 822], [ 659], [ 659], [24]]) >>> # 2d arrays must be flattened before coming into this function so >>> # information is on the last axis >>> (keys, groupxs) = group_indices(idx_to_groupid.T[0]) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) np.array([ 24, 129, 659, 822]) [ np.array([ 0, 4, 10]), np.array([1]), np.array([2, 3, 5, 6, 8, 9]), np.array([7]), ] .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = np.array([True, True, False, True, False, False, True]) >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) np.array([False, True]) [ np.array([2, 4, 5]), np.array([0, 1, 3, 6]), ] .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> import ubelt as ub >>> idx_to_groupid = [('a', 'b'), ('d', 'b'), ('a', 'b'), ('a', 'b')] >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> print(ub.repr2(keys, with_dtype=False)) >>> print(ub.repr2(groupxs, with_dtype=False)) [ ('a', 'b'), ('d', 'b'), ] [ np.array([0, 2, 3]), np.array([1]), ] .. py:function:: apply_grouping(items, groupxs, axis=0) Applies grouping from group_indicies. Typically used in conjunction with :func:`group_indices`. :Parameters: * **items** (*ndarray*) -- items to group * **groupxs** (*List[ndarrays[int]]*) -- groups of indices * **axis** (*None|int, default=0*) :returns: grouped items :rtype: List[ndarray] .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3]) >>> items = np.array([1, 8, 5, 5, 8, 6, 7, 5, 3, 0, 9]) >>> (keys, groupxs) = group_indices(idx_to_groupid) >>> grouped_items = apply_grouping(items, groupxs) >>> result = str(grouped_items) >>> print(result) [array([8, 5, 6]), array([1, 5, 8, 7]), array([5, 3, 0, 9])] .. py:function:: group_consecutive(arr, offset=1) Returns lists of consecutive values. Implementation inspired by [3]_. :Parameters: * **arr** (*ndarray*) -- array of ordered values * **offset** (*float, default=1*) -- any two values separated by this offset are grouped. In the default case, when offset=1, this groups increasing values like: 0, 1, 2. When offset is 0 it groups consecutive values thta are the same, e.g.: 4, 4, 4. :returns: a list of arrays that are the groups from the input :rtype: List[ndarray] .. rubric:: Notes This is equivalent (and faster) to using: apply_grouping(data, group_consecutive_indices(data)) .. rubric:: References .. [3] http://stackoverflow.com/questions/7352684/groups-consecutive-elements .. rubric:: Example >>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101]) >>> groups = group_consecutive(arr) >>> print('groups = {}'.format(list(map(list, groups)))) groups = [[1, 2, 3], [5, 6, 7, 8, 9, 10], [15], [99, 100, 101]] >>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1]) >>> groups = group_consecutive(arr, offset=1) >>> print('groups = {}'.format(list(map(list, groups)))) groups = [[0], [0], [3], [0], [0], [7], [2, 3, 4], [4], [4], [1], [1]] >>> groups = group_consecutive(arr, offset=0) >>> print('groups = {}'.format(list(map(list, groups)))) groups = [[0, 0], [3], [0, 0], [7], [2], [3], [4, 4, 4], [1, 1]] .. py:function:: group_consecutive_indices(arr, offset=1) Returns lists of indices pointing to consecutive values :Parameters: * **arr** (*ndarray*) -- array of ordered values * **offset** (*float, default=1*) -- any two values separated by this offset are grouped. :returns: groupxs: a list of indices :rtype: List[ndarray] SeeAlso: :func:`group_consecutive` :func:`apply_grouping` .. rubric:: Example >>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101]) >>> groupxs = group_consecutive_indices(arr) >>> print('groupxs = {}'.format(list(map(list, groupxs)))) groupxs = [[0, 1, 2], [3, 4, 5, 6, 7, 8], [9], [10, 11, 12]] >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs))) >>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1]) >>> groupxs = group_consecutive_indices(arr, offset=1) >>> print('groupxs = {}'.format(list(map(list, groupxs)))) groupxs = [[0], [1], [2], [3], [4], [5], [6, 7, 8], [9], [10], [11], [12]] >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs))) >>> groupxs = group_consecutive_indices(arr, offset=0) >>> print('groupxs = {}'.format(list(map(list, groupxs)))) groupxs = [[0, 1], [2], [3, 4], [5], [6], [7], [8, 9, 10], [11, 12]] >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 0), apply_grouping(arr, groupxs)))