kwarray.util_numpy module¶

Numpy specific extensions

kwarray.util_numpy.boolmask(indices, shape=None)[source]¶

Constructs an array of booleans where an item is True if its position is in indices otherwise it is False. This can be viewed as the inverse of numpy.where().

Parameters:

indices (NDArray) – list of integer indices
shape (int | tuple) – length of the returned list. If not specified the minimal possible shape to incoporate all the indices is used. In general, it is best practice to always specify this argument.

Returns:

mask - mask[idx] is True if idx in indices

Return type:

NDArray[Any, Int]

Example

>>> indices = [0, 1, 4]
>>> mask = boolmask(indices, shape=6)
>>> assert np.all(mask == [True, True, False, False, True, False])
>>> mask = boolmask(indices)
>>> assert np.all(mask == [True, True, False, False, True])

Example

>>> import kwarray
>>> import ubelt as ub  # NOQA
>>> indices = np.array([(0, 0), (1, 1), (2, 1)])
>>> shape = (3, 3)
>>> mask = kwarray.boolmask(indices, shape)
>>> result = ub.urepr(mask, with_dtype=0)
>>> print(result)
np.array([[ True, False, False],
          [False,  True, False],
          [False,  True, False]])

kwarray.util_numpy.iter_reduce_ufunc(ufunc, arrs, out=None, default=None)[source]¶

constant memory iteration and reduction

Applys ufunc from left to right over the input arrays

Parameters:

ufunc (Callable) – called on each pair of consecutive ndarrays
arrs (Iterator[NDArray]) – iterator of ndarrays
default (object) – return value when iterator is empty

Returns:

if len(arrs) == 0, returns default if len(arrs) == 1, returns arrs[0], if len(arrs) >= 2, returns ufunc(…ufunc(ufunc(arrs[0], arrs[1]), arrs[2]),…arrs[n-1])

Return type:

NDArray

Example

>>> arr_list = [
...     np.array([0, 1, 2, 3, 8, 9]),
...     np.array([4, 1, 2, 3, 4, 5]),
...     np.array([0, 5, 2, 3, 4, 5]),
...     np.array([1, 1, 6, 3, 4, 5]),
...     np.array([0, 1, 2, 7, 4, 5])
... ]
>>> memory = np.array([9, 9, 9, 9, 9, 9])
>>> gen_memory = memory.copy()
>>> def arr_gen(arr_list, gen_memory):
...     for arr in arr_list:
...         gen_memory[:] = arr
...         yield gen_memory
>>> print('memory = %r' % (memory,))
>>> print('gen_memory = %r' % (gen_memory,))
>>> ufunc = np.maximum
>>> res1 = iter_reduce_ufunc(ufunc, iter(arr_list), out=None)
>>> res2 = iter_reduce_ufunc(ufunc, iter(arr_list), out=memory)
>>> res3 = iter_reduce_ufunc(ufunc, arr_gen(arr_list, gen_memory), out=memory)
>>> print('res1       = %r' % (res1,))
>>> print('res2       = %r' % (res2,))
>>> print('res3       = %r' % (res3,))
>>> print('memory     = %r' % (memory,))
>>> print('gen_memory = %r' % (gen_memory,))
>>> assert np.all(res1 == res2)
>>> assert np.all(res2 == res3)

kwarray.util_numpy.isect_flags(arr, other)[source]¶

Check which items in an array intersect with another set of items

Parameters:

arr (NDArray) – items to check
other (Iterable) – items to check if they exist in arr

Returns:

booleans corresponding to arr indicating if any item in other: is also contained in other.

Return type:

NDArray

Example

>>> arr = np.array([
>>>     [1, 2, 3, 4],
>>>     [5, 6, 3, 4],
>>>     [1, 1, 3, 4],
>>> ])
>>> other = np.array([1, 4, 6])
>>> mask = isect_flags(arr, other)
>>> print(mask)
[[ True False False  True]
 [False  True False  True]
 [ True  True False  True]]

kwarray.util_numpy.atleast_nd(arr, n, front=False)[source]¶

View inputs as arrays with at least n dimensions.

Parameters:

arr (ArrayLike) – An array-like object. Non-array inputs are converted to arrays. Arrays that already have n or more dimensions are preserved.
n (int) – number of dimensions to ensure
front (bool) – if True new dimensions are added to the front of the array. otherwise they are added to the back. Defaults to False.

Returns:

An array with a.ndim >= n. Copies are avoided where possible, and views with three or more dimensions are returned. For example, a 1-D array of shape (N,) becomes a view of shape (1, N, 1), and a 2-D array of shape (M, N) becomes a view of shape (M, N, 1).

Return type:

NDArray

See also

numpy.atleast_1d, numpy.atleast_2d, numpy.atleast_3d

Example

>>> n = 2
>>> arr = np.array([1, 1, 1])
>>> arr_ = atleast_nd(arr, n)
>>> import ubelt as ub  # NOQA
>>> result = ub.urepr(arr_.tolist(), nl=0)
>>> print(result)
[[1], [1], [1]]

Example

>>> n = 4
>>> arr1 = [1, 1, 1]
>>> arr2 = np.array(0)
>>> arr3 = np.array([[[[[1]]]]])
>>> arr1_ = atleast_nd(arr1, n)
>>> arr2_ = atleast_nd(arr2, n)
>>> arr3_ = atleast_nd(arr3, n)
>>> import ubelt as ub  # NOQA
>>> result1 = ub.urepr(arr1_.tolist(), nl=0)
>>> result2 = ub.urepr(arr2_.tolist(), nl=0)
>>> result3 = ub.urepr(arr3_.tolist(), nl=0)
>>> result = '\n'.join([result1, result2, result3])
>>> print(result)
[[[[1]]], [[[1]]], [[[1]]]]
[[[[0]]]]
[[[[[1]]]]]

Note

Extensive benchmarks are in kwarray/dev/bench_atleast_nd.py

These demonstrate that this function is statistically faster than the numpy variants, although the difference is small. On average this function takes 480ns versus numpy which takes 790ns.

kwarray.util_numpy.argmaxima(arr, num, axis=None, ordered=True)[source]¶

Returns the top num maximum indicies.

This can be significantly faster than using argsort.

Parameters:

arr (NDArray) – input array
num (int) – number of maximum indices to return
axis (int | None) – axis to find maxima over. If None this is equivalent to using arr.ravel().
ordered (bool) – if False, returns the maximum elements in an arbitrary order, otherwise they are in decending order. (Setting this to false is a bit faster).

Todo

[ ] if num is None, return arg for all values equal to the maximum

Returns:: NDArray

Example

>>> # Test cases with axis=None
>>> arr = (np.random.rand(100) * 100).astype(int)
>>> for num in range(0, len(arr) + 1):
>>>     idxs = argmaxima(arr, num)
>>>     idxs2 = argmaxima(arr, num, ordered=False)
>>>     assert np.all(arr[idxs] == np.array(sorted(arr)[::-1][:len(idxs)])), 'ordered=True must return in order'
>>>     assert sorted(idxs2) == sorted(idxs), 'ordered=False must return the right idxs, but in any order'

Example

>>> # Test cases with axis
>>> arr = (np.random.rand(3, 5, 7) * 100).astype(int)
>>> for axis in range(len(arr.shape)):
>>>     for num in range(0, len(arr) + 1):
>>>         idxs = argmaxima(arr, num, axis=axis)
>>>         idxs2 = argmaxima(arr, num, ordered=False, axis=axis)
>>>         assert idxs.shape[axis] == num
>>>         assert idxs2.shape[axis] == num

kwarray.util_numpy.argminima(arr, num, axis=None, ordered=True)[source]¶

Returns the top num minimum indicies.

This can be significantly faster than using argsort.

Parameters:

arr (NDArray) – input array
num (int) – number of minimum indices to return
axis (int|None) – axis to find minima over. If None this is equivalent to using arr.ravel().
ordered (bool) – if False, returns the minimum elements in an arbitrary order, otherwise they are in ascending order. (Setting this to false is a bit faster).

Example

>>> arr = (np.random.rand(100) * 100).astype(int)
>>> for num in range(0, len(arr) + 1):
>>>     idxs = argminima(arr, num)
>>>     assert np.all(arr[idxs] == np.array(sorted(arr)[:len(idxs)])), 'ordered=True must return in order'
>>>     idxs2 = argminima(arr, num, ordered=False)
>>>     assert sorted(idxs2) == sorted(idxs), 'ordered=False must return the right idxs, but in any order'

Example

>>> # Test cases with axis
>>> from kwarray.util_numpy import *  # NOQA
>>> arr = (np.random.rand(3, 5, 7) * 100).astype(int)
>>> # make a unique array so we can check argmax consistency
>>> arr = np.arange(3 * 5 * 7)
>>> np.random.shuffle(arr)
>>> arr = arr.reshape(3, 5, 7)
>>> for axis in range(len(arr.shape)):
>>>     for num in range(0, len(arr) + 1):
>>>         idxs = argminima(arr, num, axis=axis)
>>>         idxs2 = argminima(arr, num, ordered=False, axis=axis)
>>>         print('idxs = {!r}'.format(idxs))
>>>         print('idxs2 = {!r}'.format(idxs2))
>>>         assert idxs.shape[axis] == num
>>>         assert idxs2.shape[axis] == num
>>>         # Check if argmin argrees with -argmax
>>>         idxs3 = argmaxima(-arr, num, axis=axis)
>>>         assert np.all(idxs3 == idxs)

Example

>>> arr = np.arange(20).reshape(4, 5) % 6
>>> argminima(arr, axis=1, num=2, ordered=False)
>>> argminima(arr, axis=1, num=2, ordered=True)
>>> argmaxima(-arr, axis=1, num=2, ordered=True)
>>> argmaxima(-arr, axis=1, num=2, ordered=False)

kwarray.util_numpy.unique_rows(arr, ordered=False, return_index=False)[source]¶

Like unique, but works on rows

Parameters:

arr (NDArray) – must be a contiguous C style array
ordered (bool) – if true, keeps relative ordering

References

https://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array

Example

>>> import kwarray
>>> from kwarray.util_numpy import *  # NOQA
>>> rng = kwarray.ensure_rng(0)
>>> arr = rng.randint(0, 2, size=(22, 3))
>>> arr_unique = unique_rows(arr)
>>> print('arr_unique = {!r}'.format(arr_unique))
>>> arr_unique, idxs = unique_rows(arr, return_index=True, ordered=True)
>>> assert np.all(arr[idxs] == arr_unique)
>>> print('arr_unique = {!r}'.format(arr_unique))
>>> print('idxs = {!r}'.format(idxs))
>>> arr_unique, idxs = unique_rows(arr, return_index=True, ordered=False)
>>> assert np.all(arr[idxs] == arr_unique)
>>> print('arr_unique = {!r}'.format(arr_unique))
>>> print('idxs = {!r}'.format(idxs))

kwarray.util_numpy.arglexmax(keys, multi=False)[source]¶

Find the index of the maximum element in a sequence of keys.

Parameters:

keys (tuple) – a k-tuple of k N-dimensional arrays. Like np.lexsort the last key in the sequence is used for the primary sort order, the second-to-last key for the secondary sort order, and so on.
multi (bool) – if True, returns all indices that share the max value

Returns:

either the index or list of indices

Return type:

int | NDArray[Any, Int]

Example

>>> k, N = 100, 100
>>> rng = np.random.RandomState(0)
>>> keys = [(rng.rand(N) * N).astype(int) for _ in range(k)]
>>> multi_idx = arglexmax(keys, multi=True)
>>> idxs = np.lexsort(keys)
>>> assert sorted(idxs[::-1][:len(multi_idx)]) == sorted(multi_idx)

Benchark:

>>> import ubelt as ub
>>> k, N = 100, 100
>>> rng = np.random
>>> keys = [(rng.rand(N) * N).astype(int) for _ in range(k)]
>>> for timer in ub.Timerit(100, bestof=10, label='arglexmax'):
>>>     with timer:
>>>         arglexmax(keys)
>>> for timer in ub.Timerit(100, bestof=10, label='lexsort'):
>>>     with timer:
>>>         np.lexsort(keys)[-1]

kwarray.util_numpy.generalized_logistic(x, floor=0, capacity=1, C=1, y_intercept=None, Q=None, growth=1, v=1)[source]¶

A generalization of the logistic / sigmoid functions that allows for flexible specification of S-shaped curve.

This is also known as a “Richards curve” [WikiRichardsCurve].

Parameters:

x (NDArray) – input x coordinates
floor (float) – the lower (left) asymptote. (Also called A in some texts). Defaults to 0.
capacity (float) – the carrying capacity. When C=1, this is the upper (right) asymptote. (Also called K in some texts). Defaults to 1.
C (float) – Has influence on the upper asymptote. Defaults to 1. This is typically not modified.
y_intercept (float | None) – specify where the the y intercept is at x=0. Mutually exclusive with Q.
Q (float | None) – related to the value of the function at x=0. Mutually exclusive with y_intercept. Defaults to 1.
growth (float) – the growth rate (also calle B in some texts). Defaults to 1.
v (float) – Positive number that influences near which asymptote the growth occurs. Defaults to 1.

Returns:

the values for each input

Return type:

NDArray

References

[WikiRichardsCurve]

https://en.wikipedia.org/wiki/Generalised_logistic_function

Example

>>> from kwarray.util_numpy import *  # NOQA
>>> # xdoctest: +REQUIRES(module:pandas)
>>> import pandas as pd
>>> import ubelt as ub
>>> x = np.linspace(-3, 3, 30)
>>> basis = {
>>>     # 'y_intercept': [0.1, 0.5, 0.8, -1],
>>>     # 'y_intercept': [0.1, 0.5, 0.8],
>>>     'v': [0.5, 1.0, 2.0],
>>>     'growth': [-1, 0, 2],
>>> }
>>> grid = list(ub.named_product(basis))
>>> datas = []
>>> for params in grid:
>>>     y = generalized_logistic(x, **params)
>>>     data = pd.DataFrame({'x': x, 'y': y})
>>>     key = ub.urepr(params, compact=1)
>>>     data['key'] = key
>>>     for k, v in params.items():
>>>         data[k] = v
>>>     datas.append(data)
>>> all_data = pd.concat(datas).reset_index()
>>> # xdoctest: +REQUIRES(--show)
>>> # xdoctest: +REQUIRES(module:kwplot)
>>> import kwplot
>>> plt = kwplot.autoplt()
>>> sns = kwplot.autosns()
>>> plt.gca().cla()
>>> sns.lineplot(data=all_data, x='x', y='y', hue='growth', size='v')

kwarray.util_numpy.equal_with_nan(a1, a2)[source]¶

Numpy has array_equal with equal_nan=True, but this is elementwise

Parameters:

a1 (ArrayLike) – input array
a2 (ArrayLike) – input array

Example

>>> import kwarray
>>> a1 = np.array([
>>>     [np.nan, 0, np.nan],
>>>     [np.nan, 0, 0],
>>>     [np.nan, 1, 0],
>>>     [np.nan, 1, np.nan],
>>> ])
>>> a2 = np.array([np.nan, 0, np.nan])
>>> flags = kwarray.equal_with_nan(a1, a2)
>>> assert np.array_equal(flags, np.array([
>>>     [ True, False,  True],
>>>     [ True, False, False],
>>>     [ True,  True, False],
>>>     [ True,  True,  True]
>>> ]))