The kwarray module implements a small set of pure-python extensions to
numpy and torch along with a few select algorithms. Each module contains
module level docstring that gives a rough idea of the utilities in each module,
and each function or class itself contains a docstring with more details and
examples.
KWarray is part of Kitware’s computer vision Python suite:
The API defines classmethods that work on both Tensors and ndarrays. As
such the user can simply use kwarray.ArrayAPI.<funcname> and it will
return the expected result for both Tensor and ndarray types.
However, this is inefficient because it requires us to check the type of
the input for every API call. Therefore it is recommended that you use the
ArrayAPI.coerce() function, which takes as input the data you want to
operate on. It performs the type check once, and then returns another
object that defines with an identical API, but specific to the given data
type. This means that we can ignore type checks on future calls of the
specific implementation. See examples for more details.
Example
>>> # Use the easy-to-use, but inefficient array api>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> take=kwarray.ArrayAPI.take>>> np_data=np.arange(0,143).reshape(11,13)>>> pt_data=torch.LongTensor(np_data)>>> indices=[1,3,5,7,11,13,17,21]>>> idxs0=[1,3,5,7]>>> idxs1=[1,3,5,7,11]>>> assertnp.allclose(take(np_data,indices),take(pt_data,indices))>>> assertnp.allclose(take(np_data,idxs0,0),take(pt_data,idxs0,0))>>> assertnp.allclose(take(np_data,idxs1,1),take(pt_data,idxs1,1))
Example
>>> # Use the easy-to-use, but inefficient array api>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> compress=kwarray.ArrayAPI.compress>>> np_data=np.arange(0,143).reshape(11,13)>>> pt_data=torch.LongTensor(np_data)>>> flags=(np_data%2==0).ravel()>>> f0=(np_data%2==0)[:,0]>>> f1=(np_data%2==0)[0,:]>>> assertnp.allclose(compress(np_data,flags),compress(pt_data,flags))>>> assertnp.allclose(compress(np_data,f0,0),compress(pt_data,f0,0))>>> assertnp.allclose(compress(np_data,f1,1),compress(pt_data,f1,1))
Example
>>> # Use ArrayAPI to coerce an identical API that doesnt do type checks>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> np_data=np.arange(0,15).reshape(3,5)>>> pt_data=torch.LongTensor(np_data)>>> # The new ``impl`` object has the same API as ArrayAPI, but works>>> # specifically on torch Tensors.>>> impl=kwarray.ArrayAPI.coerce(pt_data)>>> flat_data=impl.view(pt_data,-1)>>> print('flat_data = {!r}'.format(flat_data))flat_data = tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])>>> # The new ``impl`` object has the same API as ArrayAPI, but works>>> # specifically on numpy ndarrays.>>> impl=kwarray.ArrayAPI.coerce(np_data)>>> flat_data=impl.view(np_data,-1)>>> print('flat_data = {!r}'.format(flat_data))flat_data = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
The API is restricted to facilitate speed tradeoffs
Note
Assumes underlying data is Dict[list|ndarray]. If the data is known
to be a Dict[ndarray] use DataFrameArray instead, which has faster
implementations for some operations.
Note
pandas.DataFrame is slow. DataFrameLight is faster.
It is a tad more restrictive though.
>>> # xdoc: +REQUIRES(--bench)>>> fromkwarray.dataframe_lightimport*# NOQA>>> importubeltasub>>> df_light=DataFrameLight._demodata(num=1000)>>> df_heavy=df_light.pandas()>>> ti=ub.Timerit(21,bestof=3,verbose=2,unit='ms')>>> ti.reset('light').call(lambda:list(df_light.iterrows()))>>> ti.reset('heavy').call(lambda:list(df_heavy.iterrows()))>>> # xdoctest: +IGNORE_WANTTimed light for: 21 loops, best of 3 time per loop: best=0.834 ms, mean=0.850 ± 0.0 msTimed heavy for: 21 loops, best of 3 time per loop: best=45.007 ms, mean=45.633 ± 0.5 ms
>>> importkwarray>>> rng=kwarray.ensure_rng(0)>>> items=[rng.rand(rng.randint(0,10))for_inrange(10)]>>> self=kwarray.FlatIndexer.fromlist(items)>>> index=np.arange(0,len(self))>>> outer,inner=self.unravel(index)>>> recon=self.ravel(outer,inner)>>> # This check is only possible because index is an arange>>> check1=np.hstack(list(map(sorted,kwarray.group_indices(outer)[1])))>>> check2=np.hstack(kwarray.group_consecutive_indices(inner))>>> assertnp.all(check1==index)>>> assertnp.all(check2==index)>>> assertnp.all(index==recon)
Track mean, std, min, and max values over time with constant memory.
Dynamically records per-element array statistics and can summarized them
per-element, across channels, or globally.
Todo
[ ] This may need a few API tweaks and good documentation
Example
>>> importkwarray>>> run=kwarray.RunningStats()>>> ch1=np.array([[0,1],[3,4]])>>> ch2=np.zeros((2,2))>>> img=np.dstack([ch1,ch2])>>> run.update(np.dstack([ch1,ch2]))>>> run.update(np.dstack([ch1+1,ch2]))>>> run.update(np.dstack([ch1+2,ch2]))>>> # No marginalization>>> print('current-ave = '+ub.urepr(run.summarize(axis=ub.NoParam),nl=2,precision=3))>>> # Average over channels (keeps spatial dims separate)>>> print('chann-ave(k=1) = '+ub.urepr(run.summarize(axis=0),nl=2,precision=3))>>> print('chann-ave(k=0) = '+ub.urepr(run.summarize(axis=0,keepdims=0),nl=2,precision=3))>>> # Average over spatial dims (keeps channels separate)>>> print('spatial-ave(k=1) = '+ub.urepr(run.summarize(axis=(1,2)),nl=2,precision=3))>>> print('spatial-ave(k=0) = '+ub.urepr(run.summarize(axis=(1,2),keepdims=0),nl=2,precision=3))>>> # Average over all dims>>> print('alldim-ave(k=1) = '+ub.urepr(run.summarize(axis=None),nl=2,precision=3))>>> print('alldim-ave(k=0) = '+ub.urepr(run.summarize(axis=None,keepdims=0),nl=2,precision=3))
Parameters:
nan_policy (str) – indicates how we will handle nan values
if “omit” - set weights of nan items to zero.
if “propogate” - propogate nans.
if “raise” - then raise a ValueError if nans are given.
check_weights (bool):
if True, we check the weights for zeros (which can also
implicitly occur when data has nans). Disabling this check will
result in faster computation, but it is your responsibility to
ensure all data passed to update is valid.
Compute summary statistics across a one or more dimension
Parameters:
axis (int | List[int] | None | NoParamType) – axis or axes to summarize over,
if None, all axes are summarized.
if ub.NoParam, no axes are summarized the current result is
returned.
keepdims (bool, default=True) – if False removes the dimensions that are summarized over
Returns:
containing minimum, maximum, mean, std, etc..
Return type:
Dict
Raises:
NoSupportError – if update was never called with valid data
Example
>>> # Test to make sure summarize works across different shapes>>> base=np.array([1,1,1,1,0,0,0,1])>>> run0=RunningStats()>>> for_inrange(3):>>> run0.update(base.reshape(8,1))>>> run1=RunningStats()>>> for_inrange(3):>>> run1.update(base.reshape(4,2))>>> run2=RunningStats()>>> for_inrange(3):>>> run2.update(base.reshape(2,2,2))>>> #>>> # Summarizing over everything should be exactly the same>>> s0N=run0.summarize(axis=None,keepdims=0)>>> s1N=run1.summarize(axis=None,keepdims=0)>>> s2N=run2.summarize(axis=None,keepdims=0)>>> #assert ub.util_indexable.indexable_allclose(s0N, s1N, rel_tol=0.0, abs_tol=0.0)>>> #assert ub.util_indexable.indexable_allclose(s1N, s2N, rel_tol=0.0, abs_tol=0.0)>>> asserts0N['mean']==0.625
Slide a window of a certain shape over an array with a larger shape.
This can be used for iterating over a grid of sub-regions of 2d-images,
3d-volumes, or any n-dimensional array.
Yields slices of shape window that can be used to index into an array
with shape shape via numpy / torch fancy indexing. This allows for fast
fast iteration over subregions of a larger image. Because we generate a
grid-basis using only shapes, the larger image does not need to be in
memory as long as its width/height/depth/etc…
Parameters:
shape (Tuple[int, …]) – shape of source array to slide across.
window (Tuple[int, …]) – shape of window that will be slid over the
larger image.
overlap (float, default=0) – a number between 0 and 1 indicating the
fraction of overlap that parts will have. Specifying this is
mutually exclusive with stride. Must be 0 <= overlap < 1.
stride (int, default=None) – the number of cells (pixels) moved on each
step of the window. Mutually exclusive with overlap.
keepbound (bool, default=False) – if True, a non-uniform stride will be
taken to ensure that the right / bottom of the image is returned as
a slice if needed. Such a slice will not obey the overlap
constraints. (Defaults to False)
allow_overshoot (bool, default=False) – if False, we will raise an
error if the window doesn’t slide perfectly over the input shape.
Variables:
strides (basis_shape - shape of the grid corresponding to the number of) – the sliding window will take.
dimension (basis_slices - slices that will be taken in every) –
Yields:
Tuple[slice, …] –
slices used for numpy indexing, the number of slices
in the tuple
Note
For each dimension, we generate a basis (which defines a grid), and we
slide over that basis.
Todo
[ ] have an option that is allowed to go outside of the window bounds
on the right and bottom when the slider overshoots.
>>> # Test shapes that dont fit>>> # When the window is bigger than the shape, the left-aligned slices>>> # are returend.>>> self=SlidingWindow((3,3),(12,12),allow_overshoot=True,keepbound=True)>>> print(list(self))[(slice(0, 12, None), slice(0, 12, None))]>>> print(list(SlidingWindow((3,3),None,allow_overshoot=True,keepbound=True)))[(slice(0, 3, None), slice(0, 3, None))]>>> print(list(SlidingWindow((3,3),(None,2),allow_overshoot=True,keepbound=True)))[(slice(0, 3, None), slice(0, 2, None)), (slice(0, 3, None), slice(1, 3, None))]
>>> fromkwarray.util_sliderimport*# NOQA>>> importsys>>> # Build a high resolution image and slice it into chips>>> highres=np.random.rand(5,200,200).astype(np.float32)>>> target_shape=(1,50,50)>>> slider=SlidingWindow(highres.shape,target_shape,overlap=(0,.5,.5))>>> # Show how Sticher can be used to reconstruct the original image>>> stitcher=Stitcher(slider.input_shape)>>> forslinlist(slider):... chip=highres[sl]... stitcher.add(sl,chip)>>> assertstitcher.weights.max()==4,'some parts should be processed 4 times'>>> recon=stitcher.finalize()
Example
>>> fromkwarray.util_sliderimport*# NOQA>>> importsys>>> # Demo stitching 3 patterns where one has nans>>> pat1=np.full((32,32),fill_value=0.2)>>> pat2=np.full((32,32),fill_value=0.4)>>> pat3=np.full((32,32),fill_value=0.8)>>> pat1[:,16:]=0.6>>> pat2[16:,:]=np.nan>>> # Test with nan_policy=omit>>> stitcher=Stitcher(shape=(32,64),nan_policy='omit')>>> stitcher[0:32,0:32](pat1)>>> stitcher[0:32,16:48](pat2)>>> stitcher[0:32,33:64](pat3[:,1:])>>> final1=stitcher.finalize()>>> # Test without nan_policy=propogate>>> stitcher=Stitcher(shape=(32,64),nan_policy='propogate')>>> stitcher[0:32,0:32](pat1)>>> stitcher[0:32,16:48](pat2)>>> stitcher[0:32,33:64](pat3[:,1:])>>> final2=stitcher.finalize()>>> # Checks>>> assertnp.isnan(final1).sum()==16,'only should contain nan where no data was stiched'>>> assertnp.isnan(final2).sum()==512,'should contain nan wherever a nan was stitched'>>> # xdoctest: +REQUIRES(--show)>>> # xdoctest: +REQUIRES(module:kwplot)>>> importkwplot>>> importkwimage>>> kwplot.autompl()>>> kwplot.imshow(pat1,title='pat1',pnum=(3,3,1))>>> kwplot.imshow(kwimage.nodata_checkerboard(pat2,square_shape=1),title='pat2 (has nans)',pnum=(3,3,2))>>> kwplot.imshow(pat3,title='pat3',pnum=(3,3,3))>>> kwplot.imshow(kwimage.nodata_checkerboard(final1,square_shape=1),title='stitched (nan_policy=omit)',pnum=(3,1,2))>>> kwplot.imshow(kwimage.nodata_checkerboard(final2,square_shape=1),title='stitched (nan_policy=propogate)',pnum=(3,1,3))
Example
>>> # Example of weighted stitching>>> # xdoctest: +REQUIRES(module:kwimage)>>> fromkwarray.util_sliderimport*# NOQA>>> importkwimage>>> importkwarray>>> importsys>>> data=kwimage.Mask.demo().data.astype(np.float32)>>> data_dims=data.shape>>> window_dims=(8,8)>>> # We are going to slide a window over the data, do some processing>>> # and then stitch it all back together. There are a few ways we>>> # can do it. Lets demo the params.>>> basis={>>> # Vary the overlap of the slider>>> 'overlap':(0,0.5,.9),>>> # Vary if we are using weighted stitching or not>>> 'weighted':['none','gauss'],>>> 'keepbound':[True,False]>>> }>>> results=[]>>> gauss_weights=kwimage.gaussian_patch(window_dims)>>> gauss_weights=kwimage.normalize(gauss_weights)>>> forparamsinub.named_product(basis):>>> ifparams['weighted']=='none':>>> weights=None>>> elifparams['weighted']=='gauss':>>> weights=gauss_weights>>> # Build the slider and stitcher>>> slider=kwarray.SlidingWindow(>>> data_dims,window_dims,overlap=params['overlap'],>>> allow_overshoot=True,>>> keepbound=params['keepbound'])>>> stitcher=kwarray.Stitcher(data_dims)>>> # Loop over the regions>>> forslinlist(slider):>>> chip=data[sl]>>> # This is our dummy function for thie example.>>> predicted=np.ones_like(chip)*chip.sum()/chip.size>>> stitcher.add(sl,predicted,weight=weights)>>> final=stitcher.finalize()>>> results.append({>>> 'final':final,>>> 'params':params,>>> })>>> # xdoctest: +REQUIRES(--show)>>> # xdoctest: +REQUIRES(module:kwplot)>>> importkwplot>>> kwplot.autompl()>>> pnum_=kwplot.PlotNums(nCols=3,nSubplots=len(results)+2)>>> kwplot.imshow(data,pnum=pnum_(),title='input image')>>> kwplot.imshow(gauss_weights,pnum=pnum_(),title='Gaussian weights')>>> pnum_()>>> forresultinresults:>>> param_key=ub.urepr(result['params'],compact=1)>>> final=result['final']>>> canvas=kwarray.normalize(final)>>> canvas=kwimage.fill_nans_with_checkers(canvas)>>> kwplot.imshow(canvas,pnum=pnum_(),title=param_key)
Parameters:
shape (tuple) – dimensions of the large image that will be created from
the smaller pixels or patches.
device (str | int | torch.device) – default is ‘numpy’, but if given as a torch device, then
underlying operations will be done with torch tensors instead.
dtype (str) – the datatype to use in the underlying accumulator.
nan_policy (str) – if omit, check for nans and convert any to zero weight items in
stitching.
Find the index of the maximum element in a sequence of keys.
Parameters:
keys (tuple) – a k-tuple of k N-dimensional arrays.
Like np.lexsort the last key in the sequence is used for the
primary sort order, the second-to-last key for the secondary sort
order, and so on.
multi (bool) – if True, returns all indices that share the max value
This can be significantly faster than using argsort.
Parameters:
arr (NDArray) – input array
num (int) – number of maximum indices to return
axis (int | None) – axis to find maxima over. If None this is equivalent
to using arr.ravel().
ordered (bool) – if False, returns the maximum elements in an arbitrary
order, otherwise they are in decending order. (Setting this to
false is a bit faster).
Todo
[ ] if num is None, return arg for all values equal to the maximum
Returns:
NDArray
Example
>>> # Test cases with axis=None>>> arr=(np.random.rand(100)*100).astype(int)>>> fornuminrange(0,len(arr)+1):>>> idxs=argmaxima(arr,num)>>> idxs2=argmaxima(arr,num,ordered=False)>>> assertnp.all(arr[idxs]==np.array(sorted(arr)[::-1][:len(idxs)])),'ordered=True must return in order'>>> assertsorted(idxs2)==sorted(idxs),'ordered=False must return the right idxs, but in any order'
Example
>>> # Test cases with axis>>> arr=(np.random.rand(3,5,7)*100).astype(int)>>> foraxisinrange(len(arr.shape)):>>> fornuminrange(0,len(arr)+1):>>> idxs=argmaxima(arr,num,axis=axis)>>> idxs2=argmaxima(arr,num,ordered=False,axis=axis)>>> assertidxs.shape[axis]==num>>> assertidxs2.shape[axis]==num
This can be significantly faster than using argsort.
Parameters:
arr (NDArray) – input array
num (int) – number of minimum indices to return
axis (int|None) – axis to find minima over.
If None this is equivalent to using arr.ravel().
ordered (bool) – if False, returns the minimum elements in an arbitrary
order, otherwise they are in ascending order. (Setting this to
false is a bit faster).
Example
>>> arr=(np.random.rand(100)*100).astype(int)>>> fornuminrange(0,len(arr)+1):>>> idxs=argminima(arr,num)>>> assertnp.all(arr[idxs]==np.array(sorted(arr)[:len(idxs)])),'ordered=True must return in order'>>> idxs2=argminima(arr,num,ordered=False)>>> assertsorted(idxs2)==sorted(idxs),'ordered=False must return the right idxs, but in any order'
Example
>>> # Test cases with axis>>> fromkwarray.util_numpyimport*# NOQA>>> arr=(np.random.rand(3,5,7)*100).astype(int)>>> # make a unique array so we can check argmax consistency>>> arr=np.arange(3*5*7)>>> np.random.shuffle(arr)>>> arr=arr.reshape(3,5,7)>>> foraxisinrange(len(arr.shape)):>>> fornuminrange(0,len(arr)+1):>>> idxs=argminima(arr,num,axis=axis)>>> idxs2=argminima(arr,num,ordered=False,axis=axis)>>> print('idxs = {!r}'.format(idxs))>>> print('idxs2 = {!r}'.format(idxs2))>>> assertidxs.shape[axis]==num>>> assertidxs2.shape[axis]==num>>> # Check if argmin argrees with -argmax>>> idxs3=argmaxima(-arr,num,axis=axis)>>> assertnp.all(idxs3==idxs)
arr (ArrayLike) – An array-like object. Non-array inputs are converted to arrays.
Arrays that already have n or more dimensions are preserved.
n (int) – number of dimensions to ensure
front (bool) – if True new dimensions are added to the front of the array.
otherwise they are added to the back. Defaults to False.
Returns:
An array with a.ndim>=n. Copies are avoided where possible,
and views with three or more dimensions are returned. For example,
a 1-D array of shape (N,) becomes a view of shape
(1,N,1), and a 2-D array of shape (M,N) becomes a view
of shape (M,N,1).
Extensive benchmarks are in
kwarray/dev/bench_atleast_nd.py
These demonstrate that this function is statistically faster than the
numpy variants, although the difference is small. On average this
function takes 480ns versus numpy which takes 790ns.
Constructs an array of booleans where an item is True if its position is in
indices otherwise it is False. This can be viewed as the inverse of
numpy.where().
Parameters:
indices (NDArray) – list of integer indices
shape (int | tuple) – length of the returned list. If not specified
the minimal possible shape to incoporate all the indices is used.
In general, it is best practice to always specify this argument.
Embeds a “padded-slice” inside known data dimension.
Returns the valid data portion of the slice with extra padding for regions
outside of the available dimension.
Given a slices for each dimension, image dimensions, and a padding get the
corresponding slice from the image and any extra padding needed to achieve
the requested window size.
Todo
[ ] Add the option to return the inverse slice
Parameters:
slices (Tuple[slice, …]) – a tuple of slices for to apply to data data dimension.
This function is useful for ensuring that your code uses a controlled
internal random state that is independent of other modules.
If the input is None, then a global random state is returned.
If the input is a numeric value, then that is used as a seed to construct a
random state.
If the input is a random number generator, then another random number
generator with the same state is returned. Depending on the api, this
random state is either return as-is, or used to construct an equivalent
random state with the requested api.
Parameters:
rng (int | float | None | numpy.random.RandomState | random.Random) – if None, then defaults to the global rng. Otherwise this can
be an integer or a RandomState class. Defaults to the global
random.
api (str) – specify the type of random number
generator to use. This can either be ‘numpy’ for a
numpy.random.RandomState object or ‘python’ for a
random.Random object. Defaults to numpy.
Returns:
rng - either a numpy or python random number generator, depending
on the setting of api.
>>> num=4>>> print('--- Python as PYTHON ---')>>> py_rng=random.Random(0)>>> pp_nums=[py_rng.random()for_inrange(num)]>>> print(pp_nums)>>> print('--- Numpy as PYTHON ---')>>> np_rng=ensure_rng(random.Random(0),api='numpy')>>> np_nums=[np_rng.rand()for_inrange(num)]>>> print(np_nums)>>> print('--- Numpy as NUMPY---')>>> np_rng=np.random.RandomState(seed=0)>>> nn_nums=[np_rng.rand()for_inrange(num)]>>> print(nn_nums)>>> print('--- Python as NUMPY---')>>> py_rng=ensure_rng(np.random.RandomState(seed=0),api='python')>>> pn_nums=[py_rng.random()for_inrange(num)]>>> print(pn_nums)>>> assertnp_nums==pp_nums>>> assertpn_nums==nn_nums
Example
>>> # Test that random modules can be coerced>>> importrandom>>> importnumpyasnp>>> ensure_rng(random,api='python')>>> ensure_rng(random,api='numpy')>>> ensure_rng(np.random,api='python')>>> ensure_rng(np.random,api='numpy')
Finds robust normalization statistics a set of scalar observations.
The idea is to estimate “fense” parameters: minimum and maximum values
where anything under / above these values are likely outliers. For
non-linear normalizaiton schemes we can also estimate an likely middle and
extent of the data.
Parameters:
data (ndarray) – a 1D numpy array where invalid data has already been removed
params (str | dict) – normalization params.
When passed as a dictionary valid params are:
scaling (str):
This is the “mode” that will be used in the final
normalization. Currently has no impact on the
Defaults to ‘linear’. Can also be ‘sigmoid’.
extrema (str):
The method for determening what the extrama are.
Can be “quantile” for strict quantile clipping
Can be “adaptive-quantile” for an IQR-like adjusted quantile method.
Can be “tukey” or “IQR” for an exact IQR method.
low (float): This is the low quantile for likely inliers.
mid (float): This is the middle quantlie for likely inliers.
high (float): This is the high quantile for likely inliers.
>>> # xdoctest: +REQUIRES(module:scipy)>>> fromkwarray.util_robustimport*# NOQA>>> fromkwarray.distributionsimportMixture>>> importubeltasub>>> # A random mixture distribution for testing>>> data=Mixture.random(6).sample(3000)
Returns lists of consecutive values. Implementation inspired by [3].
Parameters:
arr (NDArray) – array of ordered values
offset (float, default=1) – any two values separated by this offset are grouped. In the
default case, when offset=1, this groups increasing values like: 0,
1, 2. When offset is 0 it groups consecutive values thta are the
same, e.g.: 4, 4, 4.
Returns:
a list of arrays that are the groups from the input
Return type:
List[NDArray]
Note
This is equivalent (and faster) to using:
apply_grouping(data, group_consecutive_indices(data))
Find unique items and the indices at which they appear in an array.
A common use case of this function is when you have a list of objects
(often numeric but sometimes not) and an array of “group-ids” corresponding
to that list of objects.
Using this function will return a list of indices that can be used in
conjunction with apply_grouping() to group the elements. This is
most useful when you have many lists (think column-major data)
corresponding to the group-ids.
In cases where there is only one list of objects or knowing the indices
doesn’t matter, then consider using func:group_items instead.
Parameters:
idx_to_groupid (NDArray) – The input array, where each item is interpreted as a group id.
For the fastest runtime, the input array must be numeric (ideally
with integer types). If the type is non-numeric then the less
efficient ubelt.group_items() is used.
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency.
Defaults to False.
Returns:
(keys, groupxs) -
keys (NDArray):
The unique elements of the input array in order
groupxs (List[NDArray]):
Corresponding list of indexes. The i-th item is an array
indicating the indices where the item key[i] appeared in
the input array.
>>> # xdoctest: +IGNORE_WHITESPACE>>> importkwarray>>> importubeltasub>>> # 2d arrays must be flattened before coming into this function so>>> # information is on the last axis>>> idx_to_groupid=np.array([[24],[129],[659],[659],[24],... [659],[659],[822],[659],[659],[24]]).T[0]>>> (keys,groupxs)=kwarray.group_indices(idx_to_groupid)>>> # Different versions of numpy may produce different orderings>>> # so normalize these to make test output consistent>>> #[gxs.sort() for gxs in groupxs]>>> print('keys = '+ub.urepr(keys,with_dtype=False))>>> print('groupxs = '+ub.urepr(groupxs,with_dtype=False))keys = np.array([ 24, 129, 659, 822])groupxs = [ np.array([ 0, 4, 10]), np.array([1]), np.array([2, 3, 5, 6, 8, 9]), np.array([7]),]
In cases where there are many lists of items to group (think column-major
data), consider using group_indices() and apply_grouping()
instead.
Parameters:
item_list (NDArray) – The input array of items to group.
Extended typing NDArray[Any,VT]
groupid_list (NDArray) – Each item is an id corresponding to the item at the same position
in item_list. For the fastest runtime, the input array must be
numeric (ideally with integer types). This list must be
1-dimensional.
Extended typing NDArray[Any,KT]
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency. Defaults
to False.
axis (int | None) – Group along a particular axis in items if it is n-dimensional.
Returns:
mapping from groupids to corresponding items.
Extended typing Dict[KT,NDArray[Any,VT]].
>>> # xdoctest: +REQUIRES(module:scipy)>>> # Costs to match item i in set1 with item j in set2.>>> value=np.array([>>> [9,2,1,3],>>> [4,1,5,5],>>> [9,9,2,4],>>> [-1,-1,-1,-1],>>> ])>>> ret=maxvalue_assignment(value)>>> # Note, depending on the scipy version the assignment might change>>> # but the value should always be the same.>>> print('Total value: {}'.format(ret[1]))Total value: 23.0>>> print('Assignment: {}'.format(ret[0]))# xdoc: +IGNORE_WANTAssignment: [(0, 0), (1, 3), (2, 1)]
Finds the minimum cost assignment based on a NxM cost matrix, subject to
the constraint that each row can match at most one column and each column
can match at most one row. Any pair with a cost of infinity will not be
assigned.
Parameters:
cost (ndarray) – NxM matrix, cost[i, j] is the cost to match i and j
Returns:
tuple containing a list of assignment of rows
and columns, and the total cost of the assignment.
Normalizes input values based on a specified scheme.
The default behavior is a linear normalization between 0.0 and 1.0 based on
the min/max values of the input. Parameters can be specified to achieve
more general constrat stretching or signal rebalancing. Implements the
linear and sigmoid normalization methods described in [WikiNorm].
Parameters:
arr (NDArray) – array to normalize, usually an image
out (NDArray | None) – output array. Note, that we will create an
internal floating point copy for integer computations.
mode (str) – either linear or sigmoid.
alpha (float) – Only used if mode=sigmoid. Division factor
(pre-sigmoid). If unspecified computed as:
max(abs(old_min-beta),abs(old_max-beta))/6.212606.
Note this parameter is sensitive to if the input is a float or
uint8 image.
beta (float) – subtractive factor (pre-sigmoid). This should be the
intensity of the most interesting bits of the image, i.e. bring
them to the center (0) of the distribution.
Defaults to (max-min)/2. Note this parameter is sensitive
to if the input is a float or uint8 image.
min_val – inputs lower than this minimum value are clipped
max_val – inputs higher than this maximum value are clipped.
Allows slices with out-of-bound coordinates. Any out of bounds coordinate
will be sampled via padding.
Parameters:
data (Sliceable) – data to slice into. Any channels must be the last dimension.
slices (slice | Tuple[slice, …]) – slice for each dimensions
ndim (int) – number of spatial dimensions
pad (List[int|Tuple]) – additional padding of the slice
padkw (Dict) – if unspecified defaults to {'mode':'constant'}
return_info (bool, default=False) – if True, return extra information
about the transform.
Note
Negative slices have a different meaning here then they usually do.
Normally, they indicate a wrap-around or a reversed stride, but here
they index into out-of-bounds space (which depends on the pad mode).
For example a slice of -2:1 literally samples two pixels to the left of
the data and one pixel from the data, so you get two padded values and
one data value.
SeeAlso:
embed_slice - finds the embedded slice and padding
Returns:
data_sliced: subregion of the input data (possibly with padding,
depending on if the original slice went out of bounds)
Tuple[Sliceable, Dict] :
data_sliced : as above
transform : information on how to return to the original coordinates
Currently a dict containing:
st_dims: a list indicating the low and high space-time
coordinate values of the returned data slice.
The structure of this dictionary mach change in the future
Yields num combinations of length size from items in random order
Parameters:
items (List) – pool of items to choose from
size (int) – Number of items in each combination
num (int | None) – Number of combinations to generate. If None, generate them all.
rng (int | float | None | numpy.random.RandomState | random.Random) – seed or random number generator. Defaults to the global state
of the python random module.
Yields:
Tuple – a random combination of items of length size.
Yields num items from the cartesian product of items in a random order.
Parameters:
items (List[Sequence]) – items to get caresian product of packed in a list or tuple.
(note this deviates from api of itertools.product())
num (int | None) – maximum number of items to generate. If None generat them all
rng (int | float | None | numpy.random.RandomState | random.Random) – Seed or random number generator. Defaults to the global state
of the python random module.
Normalize data intensities using heuristics to help put sensor data with
extremely high or low contrast into a visible range.
This function is designed with an emphasis on getting something that is
reasonable for visualization.
Todo
[x] Move to kwarray and renamed to robust_normalize?
[ ] Support for M-estimators?
Parameters:
imdata (ndarray) – raw intensity data
return_info (bool) – if True, return information about the chosen normalization
heuristic.
params (str | dict) – Can contain keys, low, high, or mid, scaling, extrema
e.g. {‘low’: 0.1, ‘mid’: 0.8, ‘high’: 0.9, ‘scaling’: ‘sigmoid’}
See documentation in find_robust_normalizers().
axis (None | int) – The axis to normalize over, if unspecified, normalize jointly
nodata (None | int) – A value representing nodata to leave unchanged during
normalization, for example 0
dtype (type) – can be float32 or float64
mask (ndarray | None) – A mask indicating what pixels are valid and what pixels should be
considered nodata. Mutually exclusive with nodata argument.
A mask value of 1 indicates a VALID pixel. A mask value of 0
indicates an INVALID pixel.
Note this is the opposite of a masked array.
Returns:
a floating point array with values between 0 and 1.
if return_info is specified, also returns extra data
Finds a feasible solution to the minimum weight maximum value set cover.
The quality and runtime of the solution will depend on the backend
algorithm selected.
Parameters:
candidate_sets_dict (Dict[KT, List[VT]]) – a dictionary where keys are the candidate set ids and each value is
a candidate cover set.
items (Optional[VT]) – the set of all items to be covered,
if not specified, it is infered from the candidate cover sets
set_weights (Optional[Dict[KT, float]]) – maps candidate set ids to a cost for using this candidate cover in
the solution. If not specified the weight of each candiate cover
defaults to 1.
item_values (Optional[Dict[VT, float]]) – maps each item to a value we get for returning this item in the
solution. If not specified the value of each item defaults to 1.
max_weight (Optional[float]) – if specified, the total cost of the
returned cover is constrained to be less than this number.
algo (str) – specifies which algorithm to use. Can either be
‘approx’ for the greedy solution or ‘exact’ for the globally
optimal solution. Note the ‘exact’ algorithm solves an
integer-linear-program, which can be very slow and requires
the pulp package to be installed.
Returns:
a subdict of candidate_sets_dict containing the chosen solution.
The difference between this function and
numpy.random.standard_normal() is that we use float32 arrays in the
backend instead of float64. Halving the amount of bits that need to be
manipulated can significantly reduce the execution time, and 32-bit
precision is often good enough.
Parameters:
size (int | Tuple[int, …]) – shape of the returned ndarray
mean (float, default=0) – mean of the normal distribution
std (float, default=1) – standard deviation of the normal distribution
rng (numpy.random.RandomState) – underlying random state
Returns:
normally distributed random numbers with chosen size.
>>> # xdoctest: +REQUIRES(module:scipy)>>> importscipy>>> importscipy.stats>>> pts=1000>>> # Our numbers are normally distributed with high probability>>> rng=np.random.RandomState(28041990)>>> ours_a=standard_normal32(pts,rng=rng)>>> ours_b=standard_normal32(pts,rng=rng)+2>>> ours=np.concatenate((ours_a,ours_b))# numerical stability?>>> p=scipy.stats.normaltest(ours)[1]>>> print('Probability our data is non-normal is: {:.4g}'.format(p))Probability our data is non-normal is: 1.573e-14>>> rng=np.random.RandomState(28041990)>>> theirs_a=rng.standard_normal(pts)>>> theirs_b=rng.standard_normal(pts)+2>>> theirs=np.concatenate((theirs_a,theirs_b))>>> p=scipy.stats.normaltest(theirs)[1]>>> print('Probability their data is non-normal is: {:.4g}'.format(p))Probability their data is non-normal is: 3.272e-11
>>> # Test an even and odd numbers of points>>> assertstandard_normal32(3).shape==(3,)>>> assertstandard_normal32(2).shape==(2,)>>> assertstandard_normal32(1).shape==(1,)>>> assertstandard_normal32(0).shape==(0,)>>> assertstandard_normal32((3,1)).shape==(3,1)>>> assertstandard_normal32((3,0)).shape==(3,0)
Draws float32 samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low,high) (includes low, but excludes high).
Parameters:
low (float) – Lower boundary of the output interval. All values generated will
be greater than or equal to low. Defaults to 0.
high (float) – Upper boundary of the output interval. All values generated will
be less than high. Default to 1.
size (int | Tuple[int, …] | None) – Output shape. If the given shape is, e.g., (m,n,k), then
m*n*k samples are drawn. If size is None (default),
a single value is returned if low and high are both scalars.
Otherwise, np.broadcast(low,high).size samples are drawn.
dtype (type) – either np.float32 or np.float64. Defaults to float32
rng (numpy.random.RandomState) – underlying random state
Returns:
uniformly distributed random numbers with chosen size and dtype
Extended typing NDArray[Literal[size],Literal[dtype]]
Return type:
ndarray
Benchmark
>>> fromtimeritimportTimerit>>> importkwarray>>> size=(300,300,3)>>> fortimerinTimerit(100,bestof=10,label='dtype=np.float32'):>>> rng=kwarray.ensure_rng(0)>>> withtimer:>>> ours=standard_normal(size,rng=rng,dtype=np.float32)>>> # Timed best=4.705 ms, mean=4.75 ± 0.085 ms for dtype=np.float32>>> fortimerinTimerit(100,bestof=10,label='dtype=np.float64'):>>> rng=kwarray.ensure_rng(0)>>> withtimer:>>> theirs=standard_normal(size,rng=rng,dtype=np.float64)>>> # Timed best=9.327 ms, mean=9.794 ± 0.4 ms for rng.np.float64
Draws float32 samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low,high) (includes low, but excludes high).
Parameters:
low (float, default=0.0) – Lower boundary of the output interval. All values generated will
be greater than or equal to low.
high (float, default=1.0) – Upper boundary of the output interval. All values generated will
be less than high.
size (int | Tuple[int, …] | None) – Output shape. If the given shape is, e.g., (m,n,k), then
m*n*k samples are drawn. If size is None (default),
a single value is returned if low and high are both scalars.
Otherwise, np.broadcast(low,high).size samples are drawn.
Returns:
uniformly distributed random numbers with chosen size.