The kwarray module implements a small set of pure-python extensions to
numpy and torch along with a few select algorithms. Each module contains
module level docstring that gives a rough idea of the utilities in each module,
and each function or class itself contains a docstring with more details and
examples.
KWarray is part of Kitware’s computer vision Python suite:
Finds the minimum cost assignment based on a NxM cost matrix, subject to
the constraint that each row can match at most one column and each column
can match at most one row. Any pair with a cost of infinity will not be
assigned.
Parameters:
cost (ndarray) – NxM matrix, cost[i, j] is the cost to match i and j
Returns:
tuple containing a list of assignment of rows
and columns, and the total cost of the assignment.
>>> # xdoctest: +REQUIRES(module:scipy)>>> # Costs to match item i in set1 with item j in set2.>>> value=np.array([>>> [9,2,1,3],>>> [4,1,5,5],>>> [9,9,2,4],>>> [-1,-1,-1,-1],>>> ])>>> ret=maxvalue_assignment(value)>>> # Note, depending on the scipy version the assignment might change>>> # but the value should always be the same.>>> print('Total value: {}'.format(ret[1]))Total value: 23.0>>> print('Assignment: {}'.format(ret[0]))# xdoc: +IGNORE_WANTAssignment: [(0, 0), (1, 3), (2, 1)]
Finds a feasible solution to the minimum weight maximum value set cover.
The quality and runtime of the solution will depend on the backend
algorithm selected.
Parameters:
candidate_sets_dict (Dict[KT, List[VT]]) – a dictionary where keys are the candidate set ids and each value is
a candidate cover set.
items (Optional[VT]) – the set of all items to be covered,
if not specified, it is infered from the candidate cover sets
set_weights (Optional[Dict[KT, float]]) – maps candidate set ids to a cost for using this candidate cover in
the solution. If not specified the weight of each candiate cover
defaults to 1.
item_values (Optional[Dict[VT, float]]) – maps each item to a value we get for returning this item in the
solution. If not specified the value of each item defaults to 1.
max_weight (Optional[float]) – if specified, the total cost of the
returned cover is constrained to be less than this number.
algo (str) – specifies which algorithm to use. Can either be
‘approx’ for the greedy solution or ‘exact’ for the globally
optimal solution. Note the ‘exact’ algorithm solves an
integer-linear-program, which can be very slow and requires
the pulp package to be installed.
Returns:
a subdict of candidate_sets_dict containing the chosen solution.
The approximation gaurentees depend on specifications of set weights and
item values
Running time:
N = number of universe items
C = number of candidate covering sets
Worst case running time is: O(C^2 * CN)
(note this is via simple analysis, the big-oh might be better)
Set Cover: log(len(items) + 1) approximation algorithm
Weighted Maximum Cover: 1 - 1/e == .632 approximation algorithm
Generalized maximum coverage is not implemented [WikiMaxCov].
The ArrayAPI is a common API that works exactly the same on both torch.Tensors
and numpy.ndarrays.
The ArrayAPI is a combination of efficiency and convinience. It is convinient
because you can just use an operation directly, it will type check the data,
and apply the appropriate method. But it is also efficient because it can be
used with minimal type checking by accessing a type-specific backend.
For example, you can do:
impl=kwarray.ArrayAPI.coerce(data)
And then impl will give you direct access to the appropriate methods without
any type checking overhead. e..g. impl.<op-you-want>(data)
But you can also do kwarray.ArrayAPI.<op-you-want>(data) on anything and it
will do type checking and then do the operation you want.
Idea:
Perhaps we could separate this into its own python package (maybe called
“onearray”), where the module itself behaves like the ArrayAPI. Design
goals are to provide easy to use (as drop-in as possible) replacements for
torch or numpy function calls. It has to have near-zero overhead, or at
least a way to make that happen.
In modern versions of torch and numpy if there are multiple maximum
values the index of the instance is returned. This is not true in
older versions of torch. I’m unsure when this gaurentee was added
to numpy.
Example
>>> # xdoctest: +REQUIRES(module:torch)>>> fromkwarray.arrayapiimport*# NOQA>>> fromkwarray.arrayapiimportArrayAPI>>> importtorch>>> data=1/(1+(torch.arange(12)-6).view(3,4)**2)>>> ArrayAPI.max_argmax(data)(tensor(1...), tensor(6))>>> # When the values are all the same, there doesn't seem>>> # to be a reliable spec on which one is returned first.>>> np.ones(10).argmax()# xdoctest: +IGNORE_WANT0>>> # Newer versions of torch (e.g. 1.12.0)>>> torch.ones(10).argmax()# xdoctest: +IGNORE_WANTtensor(0)>>> # Older versions of torch (e.g 1.6.0)>>> torch.ones(10).argmax()# xdoctest: +IGNORE_WANTtensor(9)
In modern versions of torch and numpy if there are multiple minimum
values the index of the instance is returned. This is not true in
older versions of torch. I’m unsure when this gaurentee was added
to numpy.
Example
>>> # xdoctest: +REQUIRES(module:torch)>>> fromkwarray.arrayapiimport*# NOQA>>> fromkwarray.arrayapiimportArrayAPI>>> importtorch>>> data=(torch.arange(12)-6).view(3,4)**2>>> ArrayAPI.min_argmin(data)(tensor(0), tensor(6))>>> # Issue demo:>>> # When the values are all the same, there doesn't seem>>> # to be a reliable spec on which one is returned first.>>> np.ones(10).argmin()# xdoctest: +IGNORE_WANT0>>> # Newer versions of torch (e.g. 1.12.0)>>> torch.ones(10).argmin()# xdoctest: +IGNORE_WANTtensor(0)>>> # Older versions of torch (e.g 1.6.0)>>> torch.ones(10).argmin()# xdoctest: +IGNORE_WANTtensor(9)
Stack arrays in sequence horizontally (column wise).
This is equivalent to concatenation along the second axis, except for 1-D
arrays where it concatenates along the first axis. Rebuilds arrays divided
by hsplit.
This function makes most sense for arrays with up to 3 dimensions. For
instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions concatenate, stack and
block provide more general stacking and concatenation operations.
Parameters:
tup (sequence of ndarrays) – The arrays must have the same shape along all but the second axis,
except 1-D arrays which can be any length.
dtype (str or dtype) – If provided, the destination array will have this dtype. Cannot be
provided together with out.
.. versionadded:: 1.24
casting ({‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’}, optional) – Controls what kind of data casting may occur. Defaults to ‘same_kind’.
.. versionadded:: 1.24
Returns:
stacked – The array formed by stacking the given arrays.
This is equivalent to concatenation along the first axis after 1-D arrays
of shape (N,) have been reshaped to (1,N). Rebuilds arrays divided by
vsplit.
This function makes most sense for arrays with up to 3 dimensions. For
instance, for pixel-data with a height (first axis), width (second axis),
and r/g/b channels (third axis). The functions concatenate, stack and
block provide more general stacking and concatenation operations.
np.row_stack is an alias for vstack. They are the same function.
Parameters:
tup (sequence of ndarrays) – The arrays must have the same shape along all but the first axis.
1-D arrays must have the same length.
dtype (str or dtype) – If provided, the destination array will have this dtype. Cannot be
provided together with out.
.. versionadded:: 1.24
casting ({‘no’, ‘equiv’, ‘safe’, ‘same_kind’, ‘unsafe’}, optional) – Controls what kind of data casting may occur. Defaults to ‘same_kind’.
.. versionadded:: 1.24
Returns:
stacked – The array formed by stacking the given arrays, will be at least 2-D.
Replace NaN with zero and infinity with large finite numbers (default
behaviour) or with the numbers defined by the user using the nan,
posinf and/or neginf keywords.
If x is inexact, NaN is replaced by zero or by the user defined value in
nan keyword, infinity is replaced by the largest finite floating point
values representable by x.dtype or by the user defined value in
posinf keyword and -infinity is replaced by the most negative finite
floating point values representable by x.dtype or by the user defined
value in neginf keyword.
For complex dtypes, the above is applied to each of the real and
imaginary components of x separately.
If x is not inexact, then no replacements are made.
Parameters:
x (scalar or array_like) – Input data.
copy (bool, optional) – Whether to create a copy of x (True) or to replace values
in-place (False). The in-place operation only occurs if
casting to an array does not require a copy.
Default is True.
New in version 1.13.
nan (int, float, optional) – Value to be used to fill NaN values. If no value is passed
then NaN values will be replaced with 0.0.
New in version 1.17.
posinf (int, float, optional) – Value to be used to fill positive infinity values. If no value is
passed then positive infinity values will be replaced with a very
large number.
New in version 1.17.
neginf (int, float, optional) – Value to be used to fill negative infinity values. If no value is
passed then negative infinity values will be replaced with a very
small (or negative) number.
New in version 1.17.
Returns:
out – x, with the non-finite values replaced. If copy is False, this may
be x itself.
Return type:
ndarray
See also
isinf
Shows which elements are positive or negative infinity.
isneginf
Shows which elements are negative infinity.
isposinf
Shows which elements are positive infinity.
isnan
Shows which elements are Not a Number (NaN).
isfinite
Shows which elements are finite (not NaN, not infinity)
Notes
NumPy uses the IEEE Standard for Binary Floating-Point for Arithmetic
(IEEE 754). This means that Not a Number is not equivalent to infinity.
True if two arrays have the same shape and elements, False otherwise.
Parameters:
a1, a2 (array_like) – Input arrays.
equal_nan (bool) – Whether to compare NaN’s as equal. If the dtype of a1 and a2 is
complex, values will be considered equal if either the real or the
imaginary component of a given value is nan.
Test whether any array element along a given axis evaluates to True.
Returns single boolean if axis is None
Parameters:
a (array_like) – Input array or object that can be converted to an array.
axis (None or int or tuple of ints, optional) – Axis or axes along which a logical OR reduction is performed.
The default (axis=None) is to perform a logical OR over all
the dimensions of the input array. axis may be negative, in
which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a reduction is performed on multiple
axes, instead of a single axis or all the axes as before.
out (ndarray, optional) – Alternate output array in which to place the result. It must have
the same shape as the expected output and its type is preserved
(e.g., if it is of type float, then it will remain so, returning
1.0 for True and 0.0 for False, regardless of the type of a).
See Output type determination for more details.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be
passed through to the any method of sub-classes of
ndarray, however any non-default value will be. If the
sub-class’ method does not implement keepdims any
exceptions will be raised.
where (array_like of bool, optional) – Elements to include in checking for any True values.
See ~numpy.ufunc.reduce for details.
New in version 1.20.0.
Returns:
any – A new boolean or ndarray is returned unless out is specified,
in which case a reference to out is returned.
>>> o=np.array(False)>>> z=np.any([-1,4,5],out=o)>>> z,o(array(True), array(True))>>> # Check now that z is a reference to o>>> zisoTrue>>> id(z),id(o)# identity of z and o (191614240, 191614240)
Test whether all array elements along a given axis evaluate to True.
Parameters:
a (array_like) – Input array or object that can be converted to an array.
axis (None or int or tuple of ints, optional) – Axis or axes along which a logical AND reduction is performed.
The default (axis=None) is to perform a logical AND over all
the dimensions of the input array. axis may be negative, in
which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a reduction is performed on multiple
axes, instead of a single axis or all the axes as before.
out (ndarray, optional) – Alternate output array in which to place the result.
It must have the same shape as the expected output and its
type is preserved (e.g., if dtype(out) is float, the result
will consist of 0.0’s and 1.0’s). See Output type determination for more
details.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left
in the result as dimensions with size one. With this option,
the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be
passed through to the all method of sub-classes of
ndarray, however any non-default value will be. If the
sub-class’ method does not implement keepdims any
exceptions will be raised.
where (array_like of bool, optional) – Elements to include in checking for all True values.
See ~numpy.ufunc.reduce for details.
New in version 1.20.0.
Returns:
all – A new boolean or array is returned unless out is specified,
in which case a reference to out is returned.
order ({‘C’, ‘F’, ‘A’, ‘K’}, optional) – Controls the memory layout of the copy. ‘C’ means C-order,
‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous,
‘C’ otherwise. ‘K’ means match the layout of a as closely
as possible. (Note that this function and ndarray.copy() are very
similar, but have different default values for their order=
arguments.)
subok (bool, optional) – If True, then sub-classes will be passed-through, otherwise the
returned array will be forced to be a base-class array (defaults to False).
New in version 1.19.0.
Returns:
arr – Array interpretation of a.
Return type:
ndarray
See also
ndarray.copy
Preferred method for creating an array copy
Notes
This is equivalent to:
>>> np.array(a,copy=True)
Examples
Create an array x, with a reference y and a copy z:
>>> x=np.array([1,2,3])>>> y=x>>> z=np.copy(x)
Note that, when we modify x, y changes, but not z:
>>> x[0]=10>>> x[0]==y[0]True>>> x[0]==z[0]False
Note that, np.copy clears previously set WRITEABLE=False flag.
Note that np.copy is a shallow copy and will not copy object
elements within arrays. This is mainly important for arrays
containing Python objects. The new array will contain the
same object which may lead to surprises if that object can
be modified (is mutable):
Return the indices of the elements that are non-zero.
Returns a tuple of arrays, one for each dimension of a,
containing the indices of the non-zero elements in that
dimension. The values in a are always tested and returned in
row-major, C-style order.
To group the indices by element, rather than dimension, use argwhere,
which returns a row for each non-zero element.
Note
When called on a zero-d array or scalar, nonzero(a) is treated
as nonzero(atleast_1d(a)).
Deprecated since version 1.17.0: Use atleast_1d explicitly if this behavior is deliberate.
Parameters:
a (array_like) – Input array.
Returns:
tuple_of_arrays – Indices of elements that are non-zero.
Return indices that are non-zero in the flattened version of the input array.
ndarray.nonzero
Equivalent ndarray method.
count_nonzero
Counts the number of non-zero elements in the input array.
Notes
While the nonzero values can be obtained with a[nonzero(a)], it is
recommended to use x[x.astype(bool)] or x[x!=0] instead, which
will correctly handle 0-d arrays.
A common use for nonzero is to find the indices of an array, where
a condition is True. Given an array a, the condition a > 3 is a
boolean array and since False is interpreted as 0, np.nonzero(a > 3)
yields the indices of the a where the condition is true.
Given an interval, values outside the interval are clipped to
the interval edges. For example, if an interval of [0,1]
is specified, values smaller than 0 become 0, and values larger
than 1 become 1.
Equivalent to but faster than np.minimum(a_max,np.maximum(a,a_min)).
No check is performed to ensure a_min<a_max.
Parameters:
a (array_like) – Array containing elements to clip.
a_min, a_max (array_like or None) – Minimum and maximum value. If None, clipping is not performed on
the corresponding edge. Only one of a_min and a_max may be
None. Both are broadcast against a.
out (ndarray, optional) – The results will be placed in this array. It may be the input
array for in-place clipping. out must be of the right shape
to hold the output. Its type is preserved.
**kwargs – For other keyword-only arguments, see the
ufunc docs.
New in version 1.17.0.
Returns:
clipped_array – An array with the elements of a, but where values
< a_min are replaced with a_min, and those > a_max
with a_max.
The API defines classmethods that work on both Tensors and ndarrays. As
such the user can simply use kwarray.ArrayAPI.<funcname> and it will
return the expected result for both Tensor and ndarray types.
However, this is inefficient because it requires us to check the type of
the input for every API call. Therefore it is recommended that you use the
ArrayAPI.coerce() function, which takes as input the data you want to
operate on. It performs the type check once, and then returns another
object that defines with an identical API, but specific to the given data
type. This means that we can ignore type checks on future calls of the
specific implementation. See examples for more details.
Example
>>> # Use the easy-to-use, but inefficient array api>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> take=kwarray.ArrayAPI.take>>> np_data=np.arange(0,143).reshape(11,13)>>> pt_data=torch.LongTensor(np_data)>>> indices=[1,3,5,7,11,13,17,21]>>> idxs0=[1,3,5,7]>>> idxs1=[1,3,5,7,11]>>> assertnp.allclose(take(np_data,indices),take(pt_data,indices))>>> assertnp.allclose(take(np_data,idxs0,0),take(pt_data,idxs0,0))>>> assertnp.allclose(take(np_data,idxs1,1),take(pt_data,idxs1,1))
Example
>>> # Use the easy-to-use, but inefficient array api>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> compress=kwarray.ArrayAPI.compress>>> np_data=np.arange(0,143).reshape(11,13)>>> pt_data=torch.LongTensor(np_data)>>> flags=(np_data%2==0).ravel()>>> f0=(np_data%2==0)[:,0]>>> f1=(np_data%2==0)[0,:]>>> assertnp.allclose(compress(np_data,flags),compress(pt_data,flags))>>> assertnp.allclose(compress(np_data,f0,0),compress(pt_data,f0,0))>>> assertnp.allclose(compress(np_data,f1,1),compress(pt_data,f1,1))
Example
>>> # Use ArrayAPI to coerce an identical API that doesnt do type checks>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> np_data=np.arange(0,15).reshape(3,5)>>> pt_data=torch.LongTensor(np_data)>>> # The new ``impl`` object has the same API as ArrayAPI, but works>>> # specifically on torch Tensors.>>> impl=kwarray.ArrayAPI.coerce(pt_data)>>> flat_data=impl.view(pt_data,-1)>>> print('flat_data = {!r}'.format(flat_data))flat_data = tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])>>> # The new ``impl`` object has the same API as ArrayAPI, but works>>> # specifically on numpy ndarrays.>>> impl=kwarray.ArrayAPI.coerce(np_data)>>> flat_data=impl.view(np_data,-1)>>> print('flat_data = {!r}'.format(flat_data))flat_data = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
The API is restricted to facilitate speed tradeoffs
Note
Assumes underlying data is Dict[list|ndarray]. If the data is known
to be a Dict[ndarray] use DataFrameArray instead, which has faster
implementations for some operations.
Note
pandas.DataFrame is slow. DataFrameLight is faster.
It is a tad more restrictive though.
>>> # xdoc: +REQUIRES(--bench)>>> fromkwarray.dataframe_lightimport*# NOQA>>> importubeltasub>>> df_light=DataFrameLight._demodata(num=1000)>>> df_heavy=df_light.pandas()>>> ti=ub.Timerit(21,bestof=3,verbose=2,unit='ms')>>> ti.reset('light').call(lambda:list(df_light.iterrows()))>>> ti.reset('heavy').call(lambda:list(df_heavy.iterrows()))>>> # xdoctest: +IGNORE_WANTTimed light for: 21 loops, best of 3 time per loop: best=0.834 ms, mean=0.850 ± 0.0 msTimed heavy for: 21 loops, best of 3 time per loop: best=45.007 ms, mean=45.633 ± 0.5 ms
In [DiscVsCont] notes that there are only 3 types of random variables:
discrete, continuous, or mixed. And these types are mutually exclusive.
Note
When inheriting from this class, you typically do not need to define
an __init__ method. Instead, overwrite the __params__ class
attribute with an OrderedDict[str, Value] to indicate what the
signature of the __init__ method should be. This allows for (1)
concise expression of new distributions and (2) for new distributions
to inherit a random classmethod that works according to constraints
specified in each parameter Value.
If you do overwrite __init__, be sure to call super().
Contains a set of distributions with associated weights. Sampling is done
by first choosing a distribution with probability proportional to its
weighthing, and then sampling from the chosen distribution.
In general, a mixture model generates data by first first we sample from z,
and then we sample the observables x from a distribution which depends on
z. , i.e. p(z, x) = p(z) p(x | z) [GrosseMixture][StephensMixture].
Parameters:
pdfs (List[Distribution]) – list of distributions
weights (List[float]) – corresponding weights of each distribution
rng (np.random.RandomState) – seed random number generator
>>> # In this examle we create a bimodal mixture of normals>>> fromkwarray.distributionsimport*# NOQA>>> pdfs=[Normal(mean=10,std=2),Normal(18,2)]>>> self=Mixture(pdfs)>>> # xdoctest: +REQUIRES(--show)>>> importkwplot>>> kwplot.autompl()>>> kwplot.figure(fnum=1,doclf=True)>>> self.plot(500,bins=25)>>> kwplot.show_if_requested()
Sampling from a mixture of k distributions with weights w_k is
equivalent to picking a distribution with probability w_k, and then
sampling from the picked distribution.
SOuser6655984 <https://stackoverflow.com/a/47762586/887074>
A distribution generated by composing different base distributions or
numbers (which are considered as constant distributions).
Given the operation and its arguments, the sampling process of a “Composed”
distribution will sample from each of the operands, and then apply the
operation to the sampled points. For instance if we add two Normal
distributions, this will first sample from each distribution and then add
the results.
Note
This is not the same as mixing distributions!
Variables:
self.operation (Function) – operation (add / sub / mult / div) to perform on operands
self.operands (Sequence[Distribution | Number]) – arguments passed to operation
Example
>>> # In this examle you can see that the sum of two Normal random>>> # variables is also normal>>> fromkwarray.distributionsimport*# NOQA>>> operands=[Normal(mean=10,std=2),Normal(15,2)]>>> operation=np.add>>> self=Composed(operation,operands)>>> data=self.sample(5)>>> print(ub.urepr(list(data),nl=0,precision=5))>>> # xdoctest: +REQUIRES(--show)>>> importkwplot>>> kwplot.autompl()>>> kwplot.figure(fnum=1,doclf=True)>>> self.plot(1000,bins=100)
Example
>>> # Binary operations result in composed distributions>>> # We can make a (bounded) exponential distribution using a uniform>>> fromkwarray.distributionsimport*# NOQA>>> X=Uniform(.001,7)>>> lam=.7>>> e=np.exp(1)>>> self=lam*e**(-lam*X)>>> data=self.sample(5)>>> print(ub.urepr(list(data),nl=0,precision=5))>>> # xdoctest: +REQUIRES(--show)>>> importkwplot>>> kwplot.autompl()>>> kwplot.figure(fnum=1,doclf=True)>>> self.plot(5000,bins=100)
Parameters:
operation (Any) – no help given. Defaults to None.
The exponential distribution is the probability distribution of the time
between events in a Poisson point process, i.e., a process in which events
occur continuously and independently at a constant average rate [1].
The Bernoulli distribution is the discrete probability distribution of a
random variable which takes the value 1 with probability p and the value
0 with probability q = 1 - p.
The Binomial distribution represents the discrete probabilities of
obtaining some number of successes in n “binary-experiments” each with a
probability of success p and a probability of failure 1 - p.
Iterate over the definitions with __params__ defined and dynamically add
relevant information to their docstrings. We should modify this so it can
rewrite the docstrings statically. I don’t like dynamic docstrings at
runtime.
>>> # Show the results of the docstring formatting>>> fromkwarrayimportdistributions>>> candidates=[]>>> forvalindistributions.__dict__.values():>>> ifhasattr(val,'__params__')andval.__params__isnotNotImplemented:>>> candidates.append(val)>>> forvalincandidates:>>> print('======')>>> print(val)>>> print('-----')>>> print(val.__doc__)>>> print('======')
Draws float32 samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low,high) (includes low, but excludes high).
Parameters:
low (float) – Lower boundary of the output interval. All values generated will
be greater than or equal to low. Defaults to 0.
high (float) – Upper boundary of the output interval. All values generated will
be less than high. Default to 1.
size (int | Tuple[int, …] | None) – Output shape. If the given shape is, e.g., (m,n,k), then
m*n*k samples are drawn. If size is None (default),
a single value is returned if low and high are both scalars.
Otherwise, np.broadcast(low,high).size samples are drawn.
dtype (type) – either np.float32 or np.float64. Defaults to float32
rng (numpy.random.RandomState) – underlying random state
Returns:
uniformly distributed random numbers with chosen size and dtype
Extended typing NDArray[Literal[size],Literal[dtype]]
Return type:
ndarray
Benchmark
>>> fromtimeritimportTimerit>>> importkwarray>>> size=(300,300,3)>>> fortimerinTimerit(100,bestof=10,label='dtype=np.float32'):>>> rng=kwarray.ensure_rng(0)>>> withtimer:>>> ours=standard_normal(size,rng=rng,dtype=np.float32)>>> # Timed best=4.705 ms, mean=4.75 ± 0.085 ms for dtype=np.float32>>> fortimerinTimerit(100,bestof=10,label='dtype=np.float64'):>>> rng=kwarray.ensure_rng(0)>>> withtimer:>>> theirs=standard_normal(size,rng=rng,dtype=np.float64)>>> # Timed best=9.327 ms, mean=9.794 ± 0.4 ms for rng.np.float64
The difference between this function and
numpy.random.standard_normal() is that we use float32 arrays in the
backend instead of float64. Halving the amount of bits that need to be
manipulated can significantly reduce the execution time, and 32-bit
precision is often good enough.
Parameters:
size (int | Tuple[int, …]) – shape of the returned ndarray
mean (float, default=0) – mean of the normal distribution
std (float, default=1) – standard deviation of the normal distribution
rng (numpy.random.RandomState) – underlying random state
Returns:
normally distributed random numbers with chosen size.
>>> # xdoctest: +REQUIRES(module:scipy)>>> importscipy>>> importscipy.stats>>> pts=1000>>> # Our numbers are normally distributed with high probability>>> rng=np.random.RandomState(28041990)>>> ours_a=standard_normal32(pts,rng=rng)>>> ours_b=standard_normal32(pts,rng=rng)+2>>> ours=np.concatenate((ours_a,ours_b))# numerical stability?>>> p=scipy.stats.normaltest(ours)[1]>>> print('Probability our data is non-normal is: {:.4g}'.format(p))Probability our data is non-normal is: 1.573e-14>>> rng=np.random.RandomState(28041990)>>> theirs_a=rng.standard_normal(pts)>>> theirs_b=rng.standard_normal(pts)+2>>> theirs=np.concatenate((theirs_a,theirs_b))>>> p=scipy.stats.normaltest(theirs)[1]>>> print('Probability their data is non-normal is: {:.4g}'.format(p))Probability their data is non-normal is: 3.272e-11
>>> # Test an even and odd numbers of points>>> assertstandard_normal32(3).shape==(3,)>>> assertstandard_normal32(2).shape==(2,)>>> assertstandard_normal32(1).shape==(1,)>>> assertstandard_normal32(0).shape==(0,)>>> assertstandard_normal32((3,1)).shape==(3,1)>>> assertstandard_normal32((3,0)).shape==(3,0)
Draws float32 samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low,high) (includes low, but excludes high).
Parameters:
low (float, default=0.0) – Lower boundary of the output interval. All values generated will
be greater than or equal to low.
high (float, default=1.0) – Upper boundary of the output interval. All values generated will
be less than high.
size (int | Tuple[int, …] | None) – Output shape. If the given shape is, e.g., (m,n,k), then
m*n*k samples are drawn. If size is None (default),
a single value is returned if low and high are both scalars.
Otherwise, np.broadcast(low,high).size samples are drawn.
Returns:
uniformly distributed random numbers with chosen size.
Currently just defines “stats_dict”, which is a nice way to gather multiple
numeric statistics (e.g. max, min, median, mode, arithmetic-mean,
geometric-mean, standard-deviation, etc…) about data in an array.
Track mean, std, min, and max values over time with constant memory.
Dynamically records per-element array statistics and can summarized them
per-element, across channels, or globally.
Todo
[ ] This may need a few API tweaks and good documentation
Example
>>> importkwarray>>> run=kwarray.RunningStats()>>> ch1=np.array([[0,1],[3,4]])>>> ch2=np.zeros((2,2))>>> img=np.dstack([ch1,ch2])>>> run.update(np.dstack([ch1,ch2]))>>> run.update(np.dstack([ch1+1,ch2]))>>> run.update(np.dstack([ch1+2,ch2]))>>> # No marginalization>>> print('current-ave = '+ub.urepr(run.summarize(axis=ub.NoParam),nl=2,precision=3))>>> # Average over channels (keeps spatial dims separate)>>> print('chann-ave(k=1) = '+ub.urepr(run.summarize(axis=0),nl=2,precision=3))>>> print('chann-ave(k=0) = '+ub.urepr(run.summarize(axis=0,keepdims=0),nl=2,precision=3))>>> # Average over spatial dims (keeps channels separate)>>> print('spatial-ave(k=1) = '+ub.urepr(run.summarize(axis=(1,2)),nl=2,precision=3))>>> print('spatial-ave(k=0) = '+ub.urepr(run.summarize(axis=(1,2),keepdims=0),nl=2,precision=3))>>> # Average over all dims>>> print('alldim-ave(k=1) = '+ub.urepr(run.summarize(axis=None),nl=2,precision=3))>>> print('alldim-ave(k=0) = '+ub.urepr(run.summarize(axis=None,keepdims=0),nl=2,precision=3))
Parameters:
nan_policy (str) – indicates how we will handle nan values
if “omit” - set weights of nan items to zero.
if “propogate” - propogate nans.
if “raise” - then raise a ValueError if nans are given.
check_weights (bool):
if True, we check the weights for zeros (which can also
implicitly occur when data has nans). Disabling this check will
result in faster computation, but it is your responsibility to
ensure all data passed to update is valid.
Compute summary statistics across a one or more dimension
Parameters:
axis (int | List[int] | None | NoParamType) – axis or axes to summarize over,
if None, all axes are summarized.
if ub.NoParam, no axes are summarized the current result is
returned.
keepdims (bool, default=True) – if False removes the dimensions that are summarized over
Returns:
containing minimum, maximum, mean, std, etc..
Return type:
Dict
Raises:
NoSupportError – if update was never called with valid data
Example
>>> # Test to make sure summarize works across different shapes>>> base=np.array([1,1,1,1,0,0,0,1])>>> run0=RunningStats()>>> for_inrange(3):>>> run0.update(base.reshape(8,1))>>> run1=RunningStats()>>> for_inrange(3):>>> run1.update(base.reshape(4,2))>>> run2=RunningStats()>>> for_inrange(3):>>> run2.update(base.reshape(2,2,2))>>> #>>> # Summarizing over everything should be exactly the same>>> s0N=run0.summarize(axis=None,keepdims=0)>>> s1N=run1.summarize(axis=None,keepdims=0)>>> s2N=run2.summarize(axis=None,keepdims=0)>>> #assert ub.util_indexable.indexable_allclose(s0N, s1N, rel_tol=0.0, abs_tol=0.0)>>> #assert ub.util_indexable.indexable_allclose(s1N, s2N, rel_tol=0.0, abs_tol=0.0)>>> asserts0N['mean']==0.625
means (array) – means[i] is the mean of the ith entry to combine
stds (array) – stds[i] is the std of the ith entry to combine
nums (array | None) – nums[i] is the number of samples in the ith entry to combine.
if None, assumes sample sizes are infinite.
axis (int | Tuple[int] | None) – axis to combine the statistics over
keepdims (bool) – if True return arrays with the same number of dimensions they were
given in.
bessel (int) – Set to 1 to enables bessel correction to unbias the combined std
estimate. Only disable if you have the true population means, or
you think you know what you are doing.
>>> # xdoctest: +REQUIRES(env:SHOW_SYMPY)>>> # What about the case where we don't know population size of the>>> # estimates. We could treat it as a fixed number, or perhaps take the>>> # limit as n -> infinity.>>> importsympy>>> importsympyassym>>> fromsympyimportsymbols,sqrt,limit,IndexedBase,summation>>> fromsympyimportIndexed,Idx,symbols>>> means=IndexedBase('m')>>> stds=IndexedBase('s')>>> nums=IndexedBase('n')>>> i=symbols('i',cls=Idx)>>> k=symbols('k',cls=Idx)>>> #>>> combo_mean=symbols('C')>>> #>>> bessel=1>>> total=summation(nums[i],(i,1,k))>>> combo_mean_expr=summation(nums[i]*means[i],(i,1,k))/total>>> p1=summation((nums[i]-bessel)*stds[i],(i,1,k))>>> p2=summation(nums[i]*((means[i]-combo_mean)**2),(i,1,k))>>> #>>> combo_std_expr=sqrt((p1+p2)/(total-bessel))>>> print('------------------------------------')>>> print('General Combined Mean / Std Formulas')>>> print('C = combined mean')>>> print('S = combined std')>>> print('------------------------------------')>>> print(ub.hzcat(['C = ',sym.pretty(combo_mean_expr,use_unicode=True,use_unicode_sqrt_char=True)]))>>> print(ub.hzcat(['S = ',sym.pretty(combo_std_expr,use_unicode=True,use_unicode_sqrt_char=True)]))>>> print('')>>> print('---------')>>> print('Now assuming all sample sizes are the same constant value N')>>> print('---------')>>> # Now assume all n[i] = N (i.e. a constant value)>>> N=symbols('N')>>> combo_mean_const_n_expr=combo_mean_expr.copy().xreplace({nums[i]:N})>>> combo_std_const_n_expr=combo_std_expr.copy().xreplace({nums[i]:N})>>> p1_const_n=p1.copy().xreplace({nums[i]:N})>>> p2_const_n=p2.copy().xreplace({nums[i]:N})>>> total_const_n=total.copy().xreplace({nums[i]:N})>>> #>>> print(ub.hzcat(['C = ',sym.pretty(combo_mean_const_n_expr,use_unicode=True,use_unicode_sqrt_char=True)]))>>> print(ub.hzcat(['S = ',sym.pretty(combo_std_const_n_expr,use_unicode=True,use_unicode_sqrt_char=True)]))>>> #>>> print('')>>> print('---------')>>> print('Take the limit as N -> infinity')>>> print('---------')>>> #>>> # Limit doesnt directly but we can break it into parts>>> lim_C=limit(combo_mean_const_n_expr,N,float('inf'))>>> lim_p1=limit(p1_const_n/(total_const_n-bessel),N,float('inf'))>>> lim_p2=limit(p2_const_n/(total_const_n-bessel),N,float('inf'))>>> lim_expr=sym.sqrt(lim_p1+lim_p2)>>> print(ub.hzcat(['lim(C, N->inf) = ',sym.pretty(lim_C)]))>>> print(ub.hzcat(['lim(S, N->inf) = ',sym.pretty(lim_expr)]))
In cases where there are many lists of items to group (think column-major
data), consider using group_indices() and apply_grouping()
instead.
Parameters:
item_list (NDArray) – The input array of items to group.
Extended typing NDArray[Any,VT]
groupid_list (NDArray) – Each item is an id corresponding to the item at the same position
in item_list. For the fastest runtime, the input array must be
numeric (ideally with integer types). This list must be
1-dimensional.
Extended typing NDArray[Any,KT]
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency. Defaults
to False.
axis (int | None) – Group along a particular axis in items if it is n-dimensional.
Returns:
mapping from groupids to corresponding items.
Extended typing Dict[KT,NDArray[Any,VT]].
Find unique items and the indices at which they appear in an array.
A common use case of this function is when you have a list of objects
(often numeric but sometimes not) and an array of “group-ids” corresponding
to that list of objects.
Using this function will return a list of indices that can be used in
conjunction with apply_grouping() to group the elements. This is
most useful when you have many lists (think column-major data)
corresponding to the group-ids.
In cases where there is only one list of objects or knowing the indices
doesn’t matter, then consider using func:group_items instead.
Parameters:
idx_to_groupid (NDArray) – The input array, where each item is interpreted as a group id.
For the fastest runtime, the input array must be numeric (ideally
with integer types). If the type is non-numeric then the less
efficient ubelt.group_items() is used.
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency.
Defaults to False.
Returns:
(keys, groupxs) -
keys (NDArray):
The unique elements of the input array in order
groupxs (List[NDArray]):
Corresponding list of indexes. The i-th item is an array
indicating the indices where the item key[i] appeared in
the input array.
>>> # xdoctest: +IGNORE_WHITESPACE>>> importkwarray>>> importubeltasub>>> # 2d arrays must be flattened before coming into this function so>>> # information is on the last axis>>> idx_to_groupid=np.array([[24],[129],[659],[659],[24],... [659],[659],[822],[659],[659],[24]]).T[0]>>> (keys,groupxs)=kwarray.group_indices(idx_to_groupid)>>> # Different versions of numpy may produce different orderings>>> # so normalize these to make test output consistent>>> #[gxs.sort() for gxs in groupxs]>>> print('keys = '+ub.urepr(keys,with_dtype=False))>>> print('groupxs = '+ub.urepr(groupxs,with_dtype=False))keys = np.array([ 24, 129, 659, 822])groupxs = [ np.array([ 0, 4, 10]), np.array([1]), np.array([2, 3, 5, 6, 8, 9]), np.array([7]),]
Returns lists of consecutive values. Implementation inspired by [3].
Parameters:
arr (NDArray) – array of ordered values
offset (float, default=1) – any two values separated by this offset are grouped. In the
default case, when offset=1, this groups increasing values like: 0,
1, 2. When offset is 0 it groups consecutive values thta are the
same, e.g.: 4, 4, 4.
Returns:
a list of arrays that are the groups from the input
Return type:
List[NDArray]
Note
This is equivalent (and faster) to using:
apply_grouping(data, group_consecutive_indices(data))
>>> importkwarray>>> rng=kwarray.ensure_rng(0)>>> items=[rng.rand(rng.randint(0,10))for_inrange(10)]>>> self=kwarray.FlatIndexer.fromlist(items)>>> index=np.arange(0,len(self))>>> outer,inner=self.unravel(index)>>> recon=self.ravel(outer,inner)>>> # This check is only possible because index is an arange>>> check1=np.hstack(list(map(sorted,kwarray.group_indices(outer)[1])))>>> check2=np.hstack(kwarray.group_consecutive_indices(inner))>>> assertnp.all(check1==index)>>> assertnp.all(check2==index)>>> assertnp.all(index==recon)
Constructs an array of booleans where an item is True if its position is in
indices otherwise it is False. This can be viewed as the inverse of
numpy.where().
Parameters:
indices (NDArray) – list of integer indices
shape (int | tuple) – length of the returned list. If not specified
the minimal possible shape to incoporate all the indices is used.
In general, it is best practice to always specify this argument.
kwarray.util_numpy.atleast_nd(arr, n, front=False)[source]¶
View inputs as arrays with at least n dimensions.
Parameters:
arr (ArrayLike) – An array-like object. Non-array inputs are converted to arrays.
Arrays that already have n or more dimensions are preserved.
n (int) – number of dimensions to ensure
front (bool) – if True new dimensions are added to the front of the array.
otherwise they are added to the back. Defaults to False.
Returns:
An array with a.ndim>=n. Copies are avoided where possible,
and views with three or more dimensions are returned. For example,
a 1-D array of shape (N,) becomes a view of shape
(1,N,1), and a 2-D array of shape (M,N) becomes a view
of shape (M,N,1).
Extensive benchmarks are in
kwarray/dev/bench_atleast_nd.py
These demonstrate that this function is statistically faster than the
numpy variants, although the difference is small. On average this
function takes 480ns versus numpy which takes 790ns.
This can be significantly faster than using argsort.
Parameters:
arr (NDArray) – input array
num (int) – number of maximum indices to return
axis (int | None) – axis to find maxima over. If None this is equivalent
to using arr.ravel().
ordered (bool) – if False, returns the maximum elements in an arbitrary
order, otherwise they are in decending order. (Setting this to
false is a bit faster).
Todo
[ ] if num is None, return arg for all values equal to the maximum
Returns:
NDArray
Example
>>> # Test cases with axis=None>>> arr=(np.random.rand(100)*100).astype(int)>>> fornuminrange(0,len(arr)+1):>>> idxs=argmaxima(arr,num)>>> idxs2=argmaxima(arr,num,ordered=False)>>> assertnp.all(arr[idxs]==np.array(sorted(arr)[::-1][:len(idxs)])),'ordered=True must return in order'>>> assertsorted(idxs2)==sorted(idxs),'ordered=False must return the right idxs, but in any order'
Example
>>> # Test cases with axis>>> arr=(np.random.rand(3,5,7)*100).astype(int)>>> foraxisinrange(len(arr.shape)):>>> fornuminrange(0,len(arr)+1):>>> idxs=argmaxima(arr,num,axis=axis)>>> idxs2=argmaxima(arr,num,ordered=False,axis=axis)>>> assertidxs.shape[axis]==num>>> assertidxs2.shape[axis]==num
This can be significantly faster than using argsort.
Parameters:
arr (NDArray) – input array
num (int) – number of minimum indices to return
axis (int|None) – axis to find minima over.
If None this is equivalent to using arr.ravel().
ordered (bool) – if False, returns the minimum elements in an arbitrary
order, otherwise they are in ascending order. (Setting this to
false is a bit faster).
Example
>>> arr=(np.random.rand(100)*100).astype(int)>>> fornuminrange(0,len(arr)+1):>>> idxs=argminima(arr,num)>>> assertnp.all(arr[idxs]==np.array(sorted(arr)[:len(idxs)])),'ordered=True must return in order'>>> idxs2=argminima(arr,num,ordered=False)>>> assertsorted(idxs2)==sorted(idxs),'ordered=False must return the right idxs, but in any order'
Example
>>> # Test cases with axis>>> fromkwarray.util_numpyimport*# NOQA>>> arr=(np.random.rand(3,5,7)*100).astype(int)>>> # make a unique array so we can check argmax consistency>>> arr=np.arange(3*5*7)>>> np.random.shuffle(arr)>>> arr=arr.reshape(3,5,7)>>> foraxisinrange(len(arr.shape)):>>> fornuminrange(0,len(arr)+1):>>> idxs=argminima(arr,num,axis=axis)>>> idxs2=argminima(arr,num,ordered=False,axis=axis)>>> print('idxs = {!r}'.format(idxs))>>> print('idxs2 = {!r}'.format(idxs2))>>> assertidxs.shape[axis]==num>>> assertidxs2.shape[axis]==num>>> # Check if argmin argrees with -argmax>>> idxs3=argmaxima(-arr,num,axis=axis)>>> assertnp.all(idxs3==idxs)
Find the index of the maximum element in a sequence of keys.
Parameters:
keys (tuple) – a k-tuple of k N-dimensional arrays.
Like np.lexsort the last key in the sequence is used for the
primary sort order, the second-to-last key for the secondary sort
order, and so on.
multi (bool) – if True, returns all indices that share the max value
Handle and interchange between different random number generators (numpy,
python, torch, …). Also defines useful random iterator functions and
ensure_rng().
If the input is a number it returns a seeded random number generator. If it is
None is returns whatever the system level RNG is. If the input is an existing
RNG it returns it without changing it. It also has the ability to switch
between Python’s random module RNG and numpys np.random RNG (it can translate
the internal state between the two).
When I write randomized functions / class, a coding pattern I like is to
have a default keyword argument rng=None. Then kwarray.ensure_rng coerces
whatever the input is into a random.Random() or
numpy.random.RandomState() object.
Then if this random function calls any other random function, it passes the
coerced rng to all other subfunctions. This ensures that seeing the RNG at
the top level produces a completely determenistic process.
Yields num combinations of length size from items in random order
Parameters:
items (List) – pool of items to choose from
size (int) – Number of items in each combination
num (int | None) – Number of combinations to generate. If None, generate them all.
rng (int | float | None | numpy.random.RandomState | random.Random) – seed or random number generator. Defaults to the global state
of the python random module.
Yields:
Tuple – a random combination of items of length size.
Yields num items from the cartesian product of items in a random order.
Parameters:
items (List[Sequence]) – items to get caresian product of packed in a list or tuple.
(note this deviates from api of itertools.product())
num (int | None) – maximum number of items to generate. If None generat them all
rng (int | float | None | numpy.random.RandomState | random.Random) – Seed or random number generator. Defaults to the global state
of the python random module.
This function is useful for ensuring that your code uses a controlled
internal random state that is independent of other modules.
If the input is None, then a global random state is returned.
If the input is a numeric value, then that is used as a seed to construct a
random state.
If the input is a random number generator, then another random number
generator with the same state is returned. Depending on the api, this
random state is either return as-is, or used to construct an equivalent
random state with the requested api.
Parameters:
rng (int | float | None | numpy.random.RandomState | random.Random) – if None, then defaults to the global rng. Otherwise this can
be an integer or a RandomState class. Defaults to the global
random.
api (str) – specify the type of random number
generator to use. This can either be ‘numpy’ for a
numpy.random.RandomState object or ‘python’ for a
random.Random object. Defaults to numpy.
Returns:
rng - either a numpy or python random number generator, depending
on the setting of api.
>>> num=4>>> print('--- Python as PYTHON ---')>>> py_rng=random.Random(0)>>> pp_nums=[py_rng.random()for_inrange(num)]>>> print(pp_nums)>>> print('--- Numpy as PYTHON ---')>>> np_rng=ensure_rng(random.Random(0),api='numpy')>>> np_nums=[np_rng.rand()for_inrange(num)]>>> print(np_nums)>>> print('--- Numpy as NUMPY---')>>> np_rng=np.random.RandomState(seed=0)>>> nn_nums=[np_rng.rand()for_inrange(num)]>>> print(nn_nums)>>> print('--- Python as NUMPY---')>>> py_rng=ensure_rng(np.random.RandomState(seed=0),api='python')>>> pn_nums=[py_rng.random()for_inrange(num)]>>> print(pn_nums)>>> assertnp_nums==pp_nums>>> assertpn_nums==nn_nums
Example
>>> # Test that random modules can be coerced>>> importrandom>>> importnumpyasnp>>> ensure_rng(random,api='python')>>> ensure_rng(random,api='numpy')>>> ensure_rng(np.random,api='python')>>> ensure_rng(np.random,api='numpy')
Finds robust normalization statistics a set of scalar observations.
The idea is to estimate “fense” parameters: minimum and maximum values
where anything under / above these values are likely outliers. For
non-linear normalizaiton schemes we can also estimate an likely middle and
extent of the data.
Parameters:
data (ndarray) – a 1D numpy array where invalid data has already been removed
params (str | dict) – normalization params.
When passed as a dictionary valid params are:
scaling (str):
This is the “mode” that will be used in the final
normalization. Currently has no impact on the
Defaults to ‘linear’. Can also be ‘sigmoid’.
extrema (str):
The method for determening what the extrama are.
Can be “quantile” for strict quantile clipping
Can be “adaptive-quantile” for an IQR-like adjusted quantile method.
Can be “tukey” or “IQR” for an exact IQR method.
low (float): This is the low quantile for likely inliers.
mid (float): This is the middle quantlie for likely inliers.
high (float): This is the high quantile for likely inliers.
>>> # xdoctest: +REQUIRES(module:scipy)>>> fromkwarray.util_robustimport*# NOQA>>> fromkwarray.distributionsimportMixture>>> importubeltasub>>> # A random mixture distribution for testing>>> data=Mixture.random(6).sample(3000)
One might wonder where the 1.5 in the above interval comes from – Paul
Velleman, a statistician at Cornell University, was a student of John
Tukey, who invented this test for outliers. He wondered the same thing.
When he asked Tukey, “Why 1.5?”, Tukey answered, “Because 1 is too small
and 2 is too large.” [OxfordShapeSpread].
Normalize data intensities using heuristics to help put sensor data with
extremely high or low contrast into a visible range.
This function is designed with an emphasis on getting something that is
reasonable for visualization.
Todo
[x] Move to kwarray and renamed to robust_normalize?
[ ] Support for M-estimators?
Parameters:
imdata (ndarray) – raw intensity data
return_info (bool) – if True, return information about the chosen normalization
heuristic.
params (str | dict) – Can contain keys, low, high, or mid, scaling, extrema
e.g. {‘low’: 0.1, ‘mid’: 0.8, ‘high’: 0.9, ‘scaling’: ‘sigmoid’}
See documentation in find_robust_normalizers().
axis (None | int) – The axis to normalize over, if unspecified, normalize jointly
nodata (None | int) – A value representing nodata to leave unchanged during
normalization, for example 0
dtype (type) – can be float32 or float64
mask (ndarray | None) – A mask indicating what pixels are valid and what pixels should be
considered nodata. Mutually exclusive with nodata argument.
A mask value of 1 indicates a VALID pixel. A mask value of 0
indicates an INVALID pixel.
Note this is the opposite of a masked array.
Returns:
a floating point array with values between 0 and 1.
if return_info is specified, also returns extra data
Normalizes input values based on a specified scheme.
The default behavior is a linear normalization between 0.0 and 1.0 based on
the min/max values of the input. Parameters can be specified to achieve
more general constrat stretching or signal rebalancing. Implements the
linear and sigmoid normalization methods described in [WikiNorm].
Parameters:
arr (NDArray) – array to normalize, usually an image
out (NDArray | None) – output array. Note, that we will create an
internal floating point copy for integer computations.
mode (str) – either linear or sigmoid.
alpha (float) – Only used if mode=sigmoid. Division factor
(pre-sigmoid). If unspecified computed as:
max(abs(old_min-beta),abs(old_max-beta))/6.212606.
Note this parameter is sensitive to if the input is a float or
uint8 image.
beta (float) – subtractive factor (pre-sigmoid). This should be the
intensity of the most interesting bits of the image, i.e. bring
them to the center (0) of the distribution.
Defaults to (max-min)/2. Note this parameter is sensitive
to if the input is a float or uint8 image.
min_val – inputs lower than this minimum value are clipped
max_val – inputs higher than this maximum value are clipped.
Allows slices with out-of-bound coordinates. Any out of bounds coordinate
will be sampled via padding.
Parameters:
data (Sliceable) – data to slice into. Any channels must be the last dimension.
slices (slice | Tuple[slice, …]) – slice for each dimensions
ndim (int) – number of spatial dimensions
pad (List[int|Tuple]) – additional padding of the slice
padkw (Dict) – if unspecified defaults to {'mode':'constant'}
return_info (bool, default=False) – if True, return extra information
about the transform.
Note
Negative slices have a different meaning here then they usually do.
Normally, they indicate a wrap-around or a reversed stride, but here
they index into out-of-bounds space (which depends on the pad mode).
For example a slice of -2:1 literally samples two pixels to the left of
the data and one pixel from the data, so you get two padded values and
one data value.
SeeAlso:
embed_slice - finds the embedded slice and padding
Returns:
data_sliced: subregion of the input data (possibly with padding,
depending on if the original slice went out of bounds)
Tuple[Sliceable, Dict] :
data_sliced : as above
transform : information on how to return to the original coordinates
Currently a dict containing:
st_dims: a list indicating the low and high space-time
coordinate values of the returned data slice.
The structure of this dictionary mach change in the future
Alternative to numpy pad with different short-cut semantics for
the “pad_width” argument.
Unlike numpy pad, you must specify a (start, stop) tuple for each
dimension. The shortcut is that you only need to specify this for the
leading dimensions. Any unspecified trailing dimension will get an implicit
(0, 0) padding.
Embeds a “padded-slice” inside known data dimension.
Returns the valid data portion of the slice with extra padding for regions
outside of the available dimension.
Given a slices for each dimension, image dimensions, and a padding get the
corresponding slice from the image and any extra padding needed to achieve
the requested window size.
Todo
[ ] Add the option to return the inverse slice
Parameters:
slices (Tuple[slice, …]) – a tuple of slices for to apply to data data dimension.
The SlidingWindow generates a grid of slices over an
numpy.ndarray(), which can then be used to compute on subsets of the
data. The Stitcher can then take these results and recombine them into
a final result that matches the larger array.
Slide a window of a certain shape over an array with a larger shape.
This can be used for iterating over a grid of sub-regions of 2d-images,
3d-volumes, or any n-dimensional array.
Yields slices of shape window that can be used to index into an array
with shape shape via numpy / torch fancy indexing. This allows for fast
fast iteration over subregions of a larger image. Because we generate a
grid-basis using only shapes, the larger image does not need to be in
memory as long as its width/height/depth/etc…
Parameters:
shape (Tuple[int, …]) – shape of source array to slide across.
window (Tuple[int, …]) – shape of window that will be slid over the
larger image.
overlap (float, default=0) – a number between 0 and 1 indicating the
fraction of overlap that parts will have. Specifying this is
mutually exclusive with stride. Must be 0 <= overlap < 1.
stride (int, default=None) – the number of cells (pixels) moved on each
step of the window. Mutually exclusive with overlap.
keepbound (bool, default=False) – if True, a non-uniform stride will be
taken to ensure that the right / bottom of the image is returned as
a slice if needed. Such a slice will not obey the overlap
constraints. (Defaults to False)
allow_overshoot (bool, default=False) – if False, we will raise an
error if the window doesn’t slide perfectly over the input shape.
Variables:
strides (basis_shape - shape of the grid corresponding to the number of) – the sliding window will take.
dimension (basis_slices - slices that will be taken in every) –
Yields:
Tuple[slice, …] –
slices used for numpy indexing, the number of slices
in the tuple
Note
For each dimension, we generate a basis (which defines a grid), and we
slide over that basis.
Todo
[ ] have an option that is allowed to go outside of the window bounds
on the right and bottom when the slider overshoots.
>>> # Test shapes that dont fit>>> # When the window is bigger than the shape, the left-aligned slices>>> # are returend.>>> self=SlidingWindow((3,3),(12,12),allow_overshoot=True,keepbound=True)>>> print(list(self))[(slice(0, 12, None), slice(0, 12, None))]>>> print(list(SlidingWindow((3,3),None,allow_overshoot=True,keepbound=True)))[(slice(0, 3, None), slice(0, 3, None))]>>> print(list(SlidingWindow((3,3),(None,2),allow_overshoot=True,keepbound=True)))[(slice(0, 3, None), slice(0, 2, None)), (slice(0, 3, None), slice(1, 3, None))]
>>> fromkwarray.util_sliderimport*# NOQA>>> importsys>>> # Build a high resolution image and slice it into chips>>> highres=np.random.rand(5,200,200).astype(np.float32)>>> target_shape=(1,50,50)>>> slider=SlidingWindow(highres.shape,target_shape,overlap=(0,.5,.5))>>> # Show how Sticher can be used to reconstruct the original image>>> stitcher=Stitcher(slider.input_shape)>>> forslinlist(slider):... chip=highres[sl]... stitcher.add(sl,chip)>>> assertstitcher.weights.max()==4,'some parts should be processed 4 times'>>> recon=stitcher.finalize()
Example
>>> fromkwarray.util_sliderimport*# NOQA>>> importsys>>> # Demo stitching 3 patterns where one has nans>>> pat1=np.full((32,32),fill_value=0.2)>>> pat2=np.full((32,32),fill_value=0.4)>>> pat3=np.full((32,32),fill_value=0.8)>>> pat1[:,16:]=0.6>>> pat2[16:,:]=np.nan>>> # Test with nan_policy=omit>>> stitcher=Stitcher(shape=(32,64),nan_policy='omit')>>> stitcher[0:32,0:32](pat1)>>> stitcher[0:32,16:48](pat2)>>> stitcher[0:32,33:64](pat3[:,1:])>>> final1=stitcher.finalize()>>> # Test without nan_policy=propogate>>> stitcher=Stitcher(shape=(32,64),nan_policy='propogate')>>> stitcher[0:32,0:32](pat1)>>> stitcher[0:32,16:48](pat2)>>> stitcher[0:32,33:64](pat3[:,1:])>>> final2=stitcher.finalize()>>> # Checks>>> assertnp.isnan(final1).sum()==16,'only should contain nan where no data was stiched'>>> assertnp.isnan(final2).sum()==512,'should contain nan wherever a nan was stitched'>>> # xdoctest: +REQUIRES(--show)>>> # xdoctest: +REQUIRES(module:kwplot)>>> importkwplot>>> importkwimage>>> kwplot.autompl()>>> kwplot.imshow(pat1,title='pat1',pnum=(3,3,1))>>> kwplot.imshow(kwimage.nodata_checkerboard(pat2,square_shape=1),title='pat2 (has nans)',pnum=(3,3,2))>>> kwplot.imshow(pat3,title='pat3',pnum=(3,3,3))>>> kwplot.imshow(kwimage.nodata_checkerboard(final1,square_shape=1),title='stitched (nan_policy=omit)',pnum=(3,1,2))>>> kwplot.imshow(kwimage.nodata_checkerboard(final2,square_shape=1),title='stitched (nan_policy=propogate)',pnum=(3,1,3))
Example
>>> # Example of weighted stitching>>> # xdoctest: +REQUIRES(module:kwimage)>>> fromkwarray.util_sliderimport*# NOQA>>> importkwimage>>> importkwarray>>> importsys>>> data=kwimage.Mask.demo().data.astype(np.float32)>>> data_dims=data.shape>>> window_dims=(8,8)>>> # We are going to slide a window over the data, do some processing>>> # and then stitch it all back together. There are a few ways we>>> # can do it. Lets demo the params.>>> basis={>>> # Vary the overlap of the slider>>> 'overlap':(0,0.5,.9),>>> # Vary if we are using weighted stitching or not>>> 'weighted':['none','gauss'],>>> 'keepbound':[True,False]>>> }>>> results=[]>>> gauss_weights=kwimage.gaussian_patch(window_dims)>>> gauss_weights=kwimage.normalize(gauss_weights)>>> forparamsinub.named_product(basis):>>> ifparams['weighted']=='none':>>> weights=None>>> elifparams['weighted']=='gauss':>>> weights=gauss_weights>>> # Build the slider and stitcher>>> slider=kwarray.SlidingWindow(>>> data_dims,window_dims,overlap=params['overlap'],>>> allow_overshoot=True,>>> keepbound=params['keepbound'])>>> stitcher=kwarray.Stitcher(data_dims)>>> # Loop over the regions>>> forslinlist(slider):>>> chip=data[sl]>>> # This is our dummy function for thie example.>>> predicted=np.ones_like(chip)*chip.sum()/chip.size>>> stitcher.add(sl,predicted,weight=weights)>>> final=stitcher.finalize()>>> results.append({>>> 'final':final,>>> 'params':params,>>> })>>> # xdoctest: +REQUIRES(--show)>>> # xdoctest: +REQUIRES(module:kwplot)>>> importkwplot>>> kwplot.autompl()>>> pnum_=kwplot.PlotNums(nCols=3,nSubplots=len(results)+2)>>> kwplot.imshow(data,pnum=pnum_(),title='input image')>>> kwplot.imshow(gauss_weights,pnum=pnum_(),title='Gaussian weights')>>> pnum_()>>> forresultinresults:>>> param_key=ub.urepr(result['params'],compact=1)>>> final=result['final']>>> canvas=kwarray.normalize(final)>>> canvas=kwimage.fill_nans_with_checkers(canvas)>>> kwplot.imshow(canvas,pnum=pnum_(),title=param_key)
Parameters:
shape (tuple) – dimensions of the large image that will be created from
the smaller pixels or patches.
device (str | int | torch.device) – default is ‘numpy’, but if given as a torch device, then
underlying operations will be done with torch tensors instead.
dtype (str) – the datatype to use in the underlying accumulator.
nan_policy (str) – if omit, check for nans and convert any to zero weight items in
stitching.
step (int, default=None) – the length of each step / distance between
slices
start (int, default=0) – starting point (in most cases set this to 0)
keepbound (bool) – if True, a non-uniform step will be taken to ensure
that the right / bottom of the image is returned as a slice if
needed. Such a slice will not obey the overlap constraints.
(Defaults to False)
check (bool) – if True an error will be raised if the window does not
cover the entire extent from start to stop, even if keepbound is
True.
The kwarray module implements a small set of pure-python extensions to
numpy and torch along with a few select algorithms. Each module contains
module level docstring that gives a rough idea of the utilities in each module,
and each function or class itself contains a docstring with more details and
examples.
KWarray is part of Kitware’s computer vision Python suite:
The API defines classmethods that work on both Tensors and ndarrays. As
such the user can simply use kwarray.ArrayAPI.<funcname> and it will
return the expected result for both Tensor and ndarray types.
However, this is inefficient because it requires us to check the type of
the input for every API call. Therefore it is recommended that you use the
ArrayAPI.coerce() function, which takes as input the data you want to
operate on. It performs the type check once, and then returns another
object that defines with an identical API, but specific to the given data
type. This means that we can ignore type checks on future calls of the
specific implementation. See examples for more details.
Example
>>> # Use the easy-to-use, but inefficient array api>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> take=kwarray.ArrayAPI.take>>> np_data=np.arange(0,143).reshape(11,13)>>> pt_data=torch.LongTensor(np_data)>>> indices=[1,3,5,7,11,13,17,21]>>> idxs0=[1,3,5,7]>>> idxs1=[1,3,5,7,11]>>> assertnp.allclose(take(np_data,indices),take(pt_data,indices))>>> assertnp.allclose(take(np_data,idxs0,0),take(pt_data,idxs0,0))>>> assertnp.allclose(take(np_data,idxs1,1),take(pt_data,idxs1,1))
Example
>>> # Use the easy-to-use, but inefficient array api>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> compress=kwarray.ArrayAPI.compress>>> np_data=np.arange(0,143).reshape(11,13)>>> pt_data=torch.LongTensor(np_data)>>> flags=(np_data%2==0).ravel()>>> f0=(np_data%2==0)[:,0]>>> f1=(np_data%2==0)[0,:]>>> assertnp.allclose(compress(np_data,flags),compress(pt_data,flags))>>> assertnp.allclose(compress(np_data,f0,0),compress(pt_data,f0,0))>>> assertnp.allclose(compress(np_data,f1,1),compress(pt_data,f1,1))
Example
>>> # Use ArrayAPI to coerce an identical API that doesnt do type checks>>> # xdoctest: +REQUIRES(module:torch)>>> importkwarray>>> importtorch>>> np_data=np.arange(0,15).reshape(3,5)>>> pt_data=torch.LongTensor(np_data)>>> # The new ``impl`` object has the same API as ArrayAPI, but works>>> # specifically on torch Tensors.>>> impl=kwarray.ArrayAPI.coerce(pt_data)>>> flat_data=impl.view(pt_data,-1)>>> print('flat_data = {!r}'.format(flat_data))flat_data = tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])>>> # The new ``impl`` object has the same API as ArrayAPI, but works>>> # specifically on numpy ndarrays.>>> impl=kwarray.ArrayAPI.coerce(np_data)>>> flat_data=impl.view(np_data,-1)>>> print('flat_data = {!r}'.format(flat_data))flat_data = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
The API is restricted to facilitate speed tradeoffs
Note
Assumes underlying data is Dict[list|ndarray]. If the data is known
to be a Dict[ndarray] use DataFrameArray instead, which has faster
implementations for some operations.
Note
pandas.DataFrame is slow. DataFrameLight is faster.
It is a tad more restrictive though.
>>> # xdoc: +REQUIRES(--bench)>>> fromkwarray.dataframe_lightimport*# NOQA>>> importubeltasub>>> df_light=DataFrameLight._demodata(num=1000)>>> df_heavy=df_light.pandas()>>> ti=ub.Timerit(21,bestof=3,verbose=2,unit='ms')>>> ti.reset('light').call(lambda:list(df_light.iterrows()))>>> ti.reset('heavy').call(lambda:list(df_heavy.iterrows()))>>> # xdoctest: +IGNORE_WANTTimed light for: 21 loops, best of 3 time per loop: best=0.834 ms, mean=0.850 ± 0.0 msTimed heavy for: 21 loops, best of 3 time per loop: best=45.007 ms, mean=45.633 ± 0.5 ms
>>> importkwarray>>> rng=kwarray.ensure_rng(0)>>> items=[rng.rand(rng.randint(0,10))for_inrange(10)]>>> self=kwarray.FlatIndexer.fromlist(items)>>> index=np.arange(0,len(self))>>> outer,inner=self.unravel(index)>>> recon=self.ravel(outer,inner)>>> # This check is only possible because index is an arange>>> check1=np.hstack(list(map(sorted,kwarray.group_indices(outer)[1])))>>> check2=np.hstack(kwarray.group_consecutive_indices(inner))>>> assertnp.all(check1==index)>>> assertnp.all(check2==index)>>> assertnp.all(index==recon)
Track mean, std, min, and max values over time with constant memory.
Dynamically records per-element array statistics and can summarized them
per-element, across channels, or globally.
Todo
[ ] This may need a few API tweaks and good documentation
Example
>>> importkwarray>>> run=kwarray.RunningStats()>>> ch1=np.array([[0,1],[3,4]])>>> ch2=np.zeros((2,2))>>> img=np.dstack([ch1,ch2])>>> run.update(np.dstack([ch1,ch2]))>>> run.update(np.dstack([ch1+1,ch2]))>>> run.update(np.dstack([ch1+2,ch2]))>>> # No marginalization>>> print('current-ave = '+ub.urepr(run.summarize(axis=ub.NoParam),nl=2,precision=3))>>> # Average over channels (keeps spatial dims separate)>>> print('chann-ave(k=1) = '+ub.urepr(run.summarize(axis=0),nl=2,precision=3))>>> print('chann-ave(k=0) = '+ub.urepr(run.summarize(axis=0,keepdims=0),nl=2,precision=3))>>> # Average over spatial dims (keeps channels separate)>>> print('spatial-ave(k=1) = '+ub.urepr(run.summarize(axis=(1,2)),nl=2,precision=3))>>> print('spatial-ave(k=0) = '+ub.urepr(run.summarize(axis=(1,2),keepdims=0),nl=2,precision=3))>>> # Average over all dims>>> print('alldim-ave(k=1) = '+ub.urepr(run.summarize(axis=None),nl=2,precision=3))>>> print('alldim-ave(k=0) = '+ub.urepr(run.summarize(axis=None,keepdims=0),nl=2,precision=3))
Parameters:
nan_policy (str) – indicates how we will handle nan values
if “omit” - set weights of nan items to zero.
if “propogate” - propogate nans.
if “raise” - then raise a ValueError if nans are given.
check_weights (bool):
if True, we check the weights for zeros (which can also
implicitly occur when data has nans). Disabling this check will
result in faster computation, but it is your responsibility to
ensure all data passed to update is valid.
Compute summary statistics across a one or more dimension
Parameters:
axis (int | List[int] | None | NoParamType) – axis or axes to summarize over,
if None, all axes are summarized.
if ub.NoParam, no axes are summarized the current result is
returned.
keepdims (bool, default=True) – if False removes the dimensions that are summarized over
Returns:
containing minimum, maximum, mean, std, etc..
Return type:
Dict
Raises:
NoSupportError – if update was never called with valid data
Example
>>> # Test to make sure summarize works across different shapes>>> base=np.array([1,1,1,1,0,0,0,1])>>> run0=RunningStats()>>> for_inrange(3):>>> run0.update(base.reshape(8,1))>>> run1=RunningStats()>>> for_inrange(3):>>> run1.update(base.reshape(4,2))>>> run2=RunningStats()>>> for_inrange(3):>>> run2.update(base.reshape(2,2,2))>>> #>>> # Summarizing over everything should be exactly the same>>> s0N=run0.summarize(axis=None,keepdims=0)>>> s1N=run1.summarize(axis=None,keepdims=0)>>> s2N=run2.summarize(axis=None,keepdims=0)>>> #assert ub.util_indexable.indexable_allclose(s0N, s1N, rel_tol=0.0, abs_tol=0.0)>>> #assert ub.util_indexable.indexable_allclose(s1N, s2N, rel_tol=0.0, abs_tol=0.0)>>> asserts0N['mean']==0.625
Slide a window of a certain shape over an array with a larger shape.
This can be used for iterating over a grid of sub-regions of 2d-images,
3d-volumes, or any n-dimensional array.
Yields slices of shape window that can be used to index into an array
with shape shape via numpy / torch fancy indexing. This allows for fast
fast iteration over subregions of a larger image. Because we generate a
grid-basis using only shapes, the larger image does not need to be in
memory as long as its width/height/depth/etc…
Parameters:
shape (Tuple[int, …]) – shape of source array to slide across.
window (Tuple[int, …]) – shape of window that will be slid over the
larger image.
overlap (float, default=0) – a number between 0 and 1 indicating the
fraction of overlap that parts will have. Specifying this is
mutually exclusive with stride. Must be 0 <= overlap < 1.
stride (int, default=None) – the number of cells (pixels) moved on each
step of the window. Mutually exclusive with overlap.
keepbound (bool, default=False) – if True, a non-uniform stride will be
taken to ensure that the right / bottom of the image is returned as
a slice if needed. Such a slice will not obey the overlap
constraints. (Defaults to False)
allow_overshoot (bool, default=False) – if False, we will raise an
error if the window doesn’t slide perfectly over the input shape.
Variables:
strides (basis_shape - shape of the grid corresponding to the number of) – the sliding window will take.
dimension (basis_slices - slices that will be taken in every) –
Yields:
Tuple[slice, …] –
slices used for numpy indexing, the number of slices
in the tuple
Note
For each dimension, we generate a basis (which defines a grid), and we
slide over that basis.
Todo
[ ] have an option that is allowed to go outside of the window bounds
on the right and bottom when the slider overshoots.
>>> # Test shapes that dont fit>>> # When the window is bigger than the shape, the left-aligned slices>>> # are returend.>>> self=SlidingWindow((3,3),(12,12),allow_overshoot=True,keepbound=True)>>> print(list(self))[(slice(0, 12, None), slice(0, 12, None))]>>> print(list(SlidingWindow((3,3),None,allow_overshoot=True,keepbound=True)))[(slice(0, 3, None), slice(0, 3, None))]>>> print(list(SlidingWindow((3,3),(None,2),allow_overshoot=True,keepbound=True)))[(slice(0, 3, None), slice(0, 2, None)), (slice(0, 3, None), slice(1, 3, None))]
>>> fromkwarray.util_sliderimport*# NOQA>>> importsys>>> # Build a high resolution image and slice it into chips>>> highres=np.random.rand(5,200,200).astype(np.float32)>>> target_shape=(1,50,50)>>> slider=SlidingWindow(highres.shape,target_shape,overlap=(0,.5,.5))>>> # Show how Sticher can be used to reconstruct the original image>>> stitcher=Stitcher(slider.input_shape)>>> forslinlist(slider):... chip=highres[sl]... stitcher.add(sl,chip)>>> assertstitcher.weights.max()==4,'some parts should be processed 4 times'>>> recon=stitcher.finalize()
Example
>>> fromkwarray.util_sliderimport*# NOQA>>> importsys>>> # Demo stitching 3 patterns where one has nans>>> pat1=np.full((32,32),fill_value=0.2)>>> pat2=np.full((32,32),fill_value=0.4)>>> pat3=np.full((32,32),fill_value=0.8)>>> pat1[:,16:]=0.6>>> pat2[16:,:]=np.nan>>> # Test with nan_policy=omit>>> stitcher=Stitcher(shape=(32,64),nan_policy='omit')>>> stitcher[0:32,0:32](pat1)>>> stitcher[0:32,16:48](pat2)>>> stitcher[0:32,33:64](pat3[:,1:])>>> final1=stitcher.finalize()>>> # Test without nan_policy=propogate>>> stitcher=Stitcher(shape=(32,64),nan_policy='propogate')>>> stitcher[0:32,0:32](pat1)>>> stitcher[0:32,16:48](pat2)>>> stitcher[0:32,33:64](pat3[:,1:])>>> final2=stitcher.finalize()>>> # Checks>>> assertnp.isnan(final1).sum()==16,'only should contain nan where no data was stiched'>>> assertnp.isnan(final2).sum()==512,'should contain nan wherever a nan was stitched'>>> # xdoctest: +REQUIRES(--show)>>> # xdoctest: +REQUIRES(module:kwplot)>>> importkwplot>>> importkwimage>>> kwplot.autompl()>>> kwplot.imshow(pat1,title='pat1',pnum=(3,3,1))>>> kwplot.imshow(kwimage.nodata_checkerboard(pat2,square_shape=1),title='pat2 (has nans)',pnum=(3,3,2))>>> kwplot.imshow(pat3,title='pat3',pnum=(3,3,3))>>> kwplot.imshow(kwimage.nodata_checkerboard(final1,square_shape=1),title='stitched (nan_policy=omit)',pnum=(3,1,2))>>> kwplot.imshow(kwimage.nodata_checkerboard(final2,square_shape=1),title='stitched (nan_policy=propogate)',pnum=(3,1,3))
Example
>>> # Example of weighted stitching>>> # xdoctest: +REQUIRES(module:kwimage)>>> fromkwarray.util_sliderimport*# NOQA>>> importkwimage>>> importkwarray>>> importsys>>> data=kwimage.Mask.demo().data.astype(np.float32)>>> data_dims=data.shape>>> window_dims=(8,8)>>> # We are going to slide a window over the data, do some processing>>> # and then stitch it all back together. There are a few ways we>>> # can do it. Lets demo the params.>>> basis={>>> # Vary the overlap of the slider>>> 'overlap':(0,0.5,.9),>>> # Vary if we are using weighted stitching or not>>> 'weighted':['none','gauss'],>>> 'keepbound':[True,False]>>> }>>> results=[]>>> gauss_weights=kwimage.gaussian_patch(window_dims)>>> gauss_weights=kwimage.normalize(gauss_weights)>>> forparamsinub.named_product(basis):>>> ifparams['weighted']=='none':>>> weights=None>>> elifparams['weighted']=='gauss':>>> weights=gauss_weights>>> # Build the slider and stitcher>>> slider=kwarray.SlidingWindow(>>> data_dims,window_dims,overlap=params['overlap'],>>> allow_overshoot=True,>>> keepbound=params['keepbound'])>>> stitcher=kwarray.Stitcher(data_dims)>>> # Loop over the regions>>> forslinlist(slider):>>> chip=data[sl]>>> # This is our dummy function for thie example.>>> predicted=np.ones_like(chip)*chip.sum()/chip.size>>> stitcher.add(sl,predicted,weight=weights)>>> final=stitcher.finalize()>>> results.append({>>> 'final':final,>>> 'params':params,>>> })>>> # xdoctest: +REQUIRES(--show)>>> # xdoctest: +REQUIRES(module:kwplot)>>> importkwplot>>> kwplot.autompl()>>> pnum_=kwplot.PlotNums(nCols=3,nSubplots=len(results)+2)>>> kwplot.imshow(data,pnum=pnum_(),title='input image')>>> kwplot.imshow(gauss_weights,pnum=pnum_(),title='Gaussian weights')>>> pnum_()>>> forresultinresults:>>> param_key=ub.urepr(result['params'],compact=1)>>> final=result['final']>>> canvas=kwarray.normalize(final)>>> canvas=kwimage.fill_nans_with_checkers(canvas)>>> kwplot.imshow(canvas,pnum=pnum_(),title=param_key)
Parameters:
shape (tuple) – dimensions of the large image that will be created from
the smaller pixels or patches.
device (str | int | torch.device) – default is ‘numpy’, but if given as a torch device, then
underlying operations will be done with torch tensors instead.
dtype (str) – the datatype to use in the underlying accumulator.
nan_policy (str) – if omit, check for nans and convert any to zero weight items in
stitching.
Find the index of the maximum element in a sequence of keys.
Parameters:
keys (tuple) – a k-tuple of k N-dimensional arrays.
Like np.lexsort the last key in the sequence is used for the
primary sort order, the second-to-last key for the secondary sort
order, and so on.
multi (bool) – if True, returns all indices that share the max value
This can be significantly faster than using argsort.
Parameters:
arr (NDArray) – input array
num (int) – number of maximum indices to return
axis (int | None) – axis to find maxima over. If None this is equivalent
to using arr.ravel().
ordered (bool) – if False, returns the maximum elements in an arbitrary
order, otherwise they are in decending order. (Setting this to
false is a bit faster).
Todo
[ ] if num is None, return arg for all values equal to the maximum
Returns:
NDArray
Example
>>> # Test cases with axis=None>>> arr=(np.random.rand(100)*100).astype(int)>>> fornuminrange(0,len(arr)+1):>>> idxs=argmaxima(arr,num)>>> idxs2=argmaxima(arr,num,ordered=False)>>> assertnp.all(arr[idxs]==np.array(sorted(arr)[::-1][:len(idxs)])),'ordered=True must return in order'>>> assertsorted(idxs2)==sorted(idxs),'ordered=False must return the right idxs, but in any order'
Example
>>> # Test cases with axis>>> arr=(np.random.rand(3,5,7)*100).astype(int)>>> foraxisinrange(len(arr.shape)):>>> fornuminrange(0,len(arr)+1):>>> idxs=argmaxima(arr,num,axis=axis)>>> idxs2=argmaxima(arr,num,ordered=False,axis=axis)>>> assertidxs.shape[axis]==num>>> assertidxs2.shape[axis]==num
This can be significantly faster than using argsort.
Parameters:
arr (NDArray) – input array
num (int) – number of minimum indices to return
axis (int|None) – axis to find minima over.
If None this is equivalent to using arr.ravel().
ordered (bool) – if False, returns the minimum elements in an arbitrary
order, otherwise they are in ascending order. (Setting this to
false is a bit faster).
Example
>>> arr=(np.random.rand(100)*100).astype(int)>>> fornuminrange(0,len(arr)+1):>>> idxs=argminima(arr,num)>>> assertnp.all(arr[idxs]==np.array(sorted(arr)[:len(idxs)])),'ordered=True must return in order'>>> idxs2=argminima(arr,num,ordered=False)>>> assertsorted(idxs2)==sorted(idxs),'ordered=False must return the right idxs, but in any order'
Example
>>> # Test cases with axis>>> fromkwarray.util_numpyimport*# NOQA>>> arr=(np.random.rand(3,5,7)*100).astype(int)>>> # make a unique array so we can check argmax consistency>>> arr=np.arange(3*5*7)>>> np.random.shuffle(arr)>>> arr=arr.reshape(3,5,7)>>> foraxisinrange(len(arr.shape)):>>> fornuminrange(0,len(arr)+1):>>> idxs=argminima(arr,num,axis=axis)>>> idxs2=argminima(arr,num,ordered=False,axis=axis)>>> print('idxs = {!r}'.format(idxs))>>> print('idxs2 = {!r}'.format(idxs2))>>> assertidxs.shape[axis]==num>>> assertidxs2.shape[axis]==num>>> # Check if argmin argrees with -argmax>>> idxs3=argmaxima(-arr,num,axis=axis)>>> assertnp.all(idxs3==idxs)
arr (ArrayLike) – An array-like object. Non-array inputs are converted to arrays.
Arrays that already have n or more dimensions are preserved.
n (int) – number of dimensions to ensure
front (bool) – if True new dimensions are added to the front of the array.
otherwise they are added to the back. Defaults to False.
Returns:
An array with a.ndim>=n. Copies are avoided where possible,
and views with three or more dimensions are returned. For example,
a 1-D array of shape (N,) becomes a view of shape
(1,N,1), and a 2-D array of shape (M,N) becomes a view
of shape (M,N,1).
Extensive benchmarks are in
kwarray/dev/bench_atleast_nd.py
These demonstrate that this function is statistically faster than the
numpy variants, although the difference is small. On average this
function takes 480ns versus numpy which takes 790ns.
Constructs an array of booleans where an item is True if its position is in
indices otherwise it is False. This can be viewed as the inverse of
numpy.where().
Parameters:
indices (NDArray) – list of integer indices
shape (int | tuple) – length of the returned list. If not specified
the minimal possible shape to incoporate all the indices is used.
In general, it is best practice to always specify this argument.
Embeds a “padded-slice” inside known data dimension.
Returns the valid data portion of the slice with extra padding for regions
outside of the available dimension.
Given a slices for each dimension, image dimensions, and a padding get the
corresponding slice from the image and any extra padding needed to achieve
the requested window size.
Todo
[ ] Add the option to return the inverse slice
Parameters:
slices (Tuple[slice, …]) – a tuple of slices for to apply to data data dimension.
This function is useful for ensuring that your code uses a controlled
internal random state that is independent of other modules.
If the input is None, then a global random state is returned.
If the input is a numeric value, then that is used as a seed to construct a
random state.
If the input is a random number generator, then another random number
generator with the same state is returned. Depending on the api, this
random state is either return as-is, or used to construct an equivalent
random state with the requested api.
Parameters:
rng (int | float | None | numpy.random.RandomState | random.Random) – if None, then defaults to the global rng. Otherwise this can
be an integer or a RandomState class. Defaults to the global
random.
api (str) – specify the type of random number
generator to use. This can either be ‘numpy’ for a
numpy.random.RandomState object or ‘python’ for a
random.Random object. Defaults to numpy.
Returns:
rng - either a numpy or python random number generator, depending
on the setting of api.
>>> num=4>>> print('--- Python as PYTHON ---')>>> py_rng=random.Random(0)>>> pp_nums=[py_rng.random()for_inrange(num)]>>> print(pp_nums)>>> print('--- Numpy as PYTHON ---')>>> np_rng=ensure_rng(random.Random(0),api='numpy')>>> np_nums=[np_rng.rand()for_inrange(num)]>>> print(np_nums)>>> print('--- Numpy as NUMPY---')>>> np_rng=np.random.RandomState(seed=0)>>> nn_nums=[np_rng.rand()for_inrange(num)]>>> print(nn_nums)>>> print('--- Python as NUMPY---')>>> py_rng=ensure_rng(np.random.RandomState(seed=0),api='python')>>> pn_nums=[py_rng.random()for_inrange(num)]>>> print(pn_nums)>>> assertnp_nums==pp_nums>>> assertpn_nums==nn_nums
Example
>>> # Test that random modules can be coerced>>> importrandom>>> importnumpyasnp>>> ensure_rng(random,api='python')>>> ensure_rng(random,api='numpy')>>> ensure_rng(np.random,api='python')>>> ensure_rng(np.random,api='numpy')
Finds robust normalization statistics a set of scalar observations.
The idea is to estimate “fense” parameters: minimum and maximum values
where anything under / above these values are likely outliers. For
non-linear normalizaiton schemes we can also estimate an likely middle and
extent of the data.
Parameters:
data (ndarray) – a 1D numpy array where invalid data has already been removed
params (str | dict) – normalization params.
When passed as a dictionary valid params are:
scaling (str):
This is the “mode” that will be used in the final
normalization. Currently has no impact on the
Defaults to ‘linear’. Can also be ‘sigmoid’.
extrema (str):
The method for determening what the extrama are.
Can be “quantile” for strict quantile clipping
Can be “adaptive-quantile” for an IQR-like adjusted quantile method.
Can be “tukey” or “IQR” for an exact IQR method.
low (float): This is the low quantile for likely inliers.
mid (float): This is the middle quantlie for likely inliers.
high (float): This is the high quantile for likely inliers.
>>> # xdoctest: +REQUIRES(module:scipy)>>> fromkwarray.util_robustimport*# NOQA>>> fromkwarray.distributionsimportMixture>>> importubeltasub>>> # A random mixture distribution for testing>>> data=Mixture.random(6).sample(3000)
Returns lists of consecutive values. Implementation inspired by [3].
Parameters:
arr (NDArray) – array of ordered values
offset (float, default=1) – any two values separated by this offset are grouped. In the
default case, when offset=1, this groups increasing values like: 0,
1, 2. When offset is 0 it groups consecutive values thta are the
same, e.g.: 4, 4, 4.
Returns:
a list of arrays that are the groups from the input
Return type:
List[NDArray]
Note
This is equivalent (and faster) to using:
apply_grouping(data, group_consecutive_indices(data))
Find unique items and the indices at which they appear in an array.
A common use case of this function is when you have a list of objects
(often numeric but sometimes not) and an array of “group-ids” corresponding
to that list of objects.
Using this function will return a list of indices that can be used in
conjunction with apply_grouping() to group the elements. This is
most useful when you have many lists (think column-major data)
corresponding to the group-ids.
In cases where there is only one list of objects or knowing the indices
doesn’t matter, then consider using func:group_items instead.
Parameters:
idx_to_groupid (NDArray) – The input array, where each item is interpreted as a group id.
For the fastest runtime, the input array must be numeric (ideally
with integer types). If the type is non-numeric then the less
efficient ubelt.group_items() is used.
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency.
Defaults to False.
Returns:
(keys, groupxs) -
keys (NDArray):
The unique elements of the input array in order
groupxs (List[NDArray]):
Corresponding list of indexes. The i-th item is an array
indicating the indices where the item key[i] appeared in
the input array.
>>> # xdoctest: +IGNORE_WHITESPACE>>> importkwarray>>> importubeltasub>>> # 2d arrays must be flattened before coming into this function so>>> # information is on the last axis>>> idx_to_groupid=np.array([[24],[129],[659],[659],[24],... [659],[659],[822],[659],[659],[24]]).T[0]>>> (keys,groupxs)=kwarray.group_indices(idx_to_groupid)>>> # Different versions of numpy may produce different orderings>>> # so normalize these to make test output consistent>>> #[gxs.sort() for gxs in groupxs]>>> print('keys = '+ub.urepr(keys,with_dtype=False))>>> print('groupxs = '+ub.urepr(groupxs,with_dtype=False))keys = np.array([ 24, 129, 659, 822])groupxs = [ np.array([ 0, 4, 10]), np.array([1]), np.array([2, 3, 5, 6, 8, 9]), np.array([7]),]
In cases where there are many lists of items to group (think column-major
data), consider using group_indices() and apply_grouping()
instead.
Parameters:
item_list (NDArray) – The input array of items to group.
Extended typing NDArray[Any,VT]
groupid_list (NDArray) – Each item is an id corresponding to the item at the same position
in item_list. For the fastest runtime, the input array must be
numeric (ideally with integer types). This list must be
1-dimensional.
Extended typing NDArray[Any,KT]
assume_sorted (bool) – If the input array is sorted, then setting this to True will avoid
an unnecessary sorting operation and improve efficiency. Defaults
to False.
axis (int | None) – Group along a particular axis in items if it is n-dimensional.
Returns:
mapping from groupids to corresponding items.
Extended typing Dict[KT,NDArray[Any,VT]].
>>> # xdoctest: +REQUIRES(module:scipy)>>> # Costs to match item i in set1 with item j in set2.>>> value=np.array([>>> [9,2,1,3],>>> [4,1,5,5],>>> [9,9,2,4],>>> [-1,-1,-1,-1],>>> ])>>> ret=maxvalue_assignment(value)>>> # Note, depending on the scipy version the assignment might change>>> # but the value should always be the same.>>> print('Total value: {}'.format(ret[1]))Total value: 23.0>>> print('Assignment: {}'.format(ret[0]))# xdoc: +IGNORE_WANTAssignment: [(0, 0), (1, 3), (2, 1)]
Finds the minimum cost assignment based on a NxM cost matrix, subject to
the constraint that each row can match at most one column and each column
can match at most one row. Any pair with a cost of infinity will not be
assigned.
Parameters:
cost (ndarray) – NxM matrix, cost[i, j] is the cost to match i and j
Returns:
tuple containing a list of assignment of rows
and columns, and the total cost of the assignment.
Normalizes input values based on a specified scheme.
The default behavior is a linear normalization between 0.0 and 1.0 based on
the min/max values of the input. Parameters can be specified to achieve
more general constrat stretching or signal rebalancing. Implements the
linear and sigmoid normalization methods described in [WikiNorm].
Parameters:
arr (NDArray) – array to normalize, usually an image
out (NDArray | None) – output array. Note, that we will create an
internal floating point copy for integer computations.
mode (str) – either linear or sigmoid.
alpha (float) – Only used if mode=sigmoid. Division factor
(pre-sigmoid). If unspecified computed as:
max(abs(old_min-beta),abs(old_max-beta))/6.212606.
Note this parameter is sensitive to if the input is a float or
uint8 image.
beta (float) – subtractive factor (pre-sigmoid). This should be the
intensity of the most interesting bits of the image, i.e. bring
them to the center (0) of the distribution.
Defaults to (max-min)/2. Note this parameter is sensitive
to if the input is a float or uint8 image.
min_val – inputs lower than this minimum value are clipped
max_val – inputs higher than this maximum value are clipped.
Allows slices with out-of-bound coordinates. Any out of bounds coordinate
will be sampled via padding.
Parameters:
data (Sliceable) – data to slice into. Any channels must be the last dimension.
slices (slice | Tuple[slice, …]) – slice for each dimensions
ndim (int) – number of spatial dimensions
pad (List[int|Tuple]) – additional padding of the slice
padkw (Dict) – if unspecified defaults to {'mode':'constant'}
return_info (bool, default=False) – if True, return extra information
about the transform.
Note
Negative slices have a different meaning here then they usually do.
Normally, they indicate a wrap-around or a reversed stride, but here
they index into out-of-bounds space (which depends on the pad mode).
For example a slice of -2:1 literally samples two pixels to the left of
the data and one pixel from the data, so you get two padded values and
one data value.
SeeAlso:
embed_slice - finds the embedded slice and padding
Returns:
data_sliced: subregion of the input data (possibly with padding,
depending on if the original slice went out of bounds)
Tuple[Sliceable, Dict] :
data_sliced : as above
transform : information on how to return to the original coordinates
Currently a dict containing:
st_dims: a list indicating the low and high space-time
coordinate values of the returned data slice.
The structure of this dictionary mach change in the future
Yields num combinations of length size from items in random order
Parameters:
items (List) – pool of items to choose from
size (int) – Number of items in each combination
num (int | None) – Number of combinations to generate. If None, generate them all.
rng (int | float | None | numpy.random.RandomState | random.Random) – seed or random number generator. Defaults to the global state
of the python random module.
Yields:
Tuple – a random combination of items of length size.
Yields num items from the cartesian product of items in a random order.
Parameters:
items (List[Sequence]) – items to get caresian product of packed in a list or tuple.
(note this deviates from api of itertools.product())
num (int | None) – maximum number of items to generate. If None generat them all
rng (int | float | None | numpy.random.RandomState | random.Random) – Seed or random number generator. Defaults to the global state
of the python random module.
Normalize data intensities using heuristics to help put sensor data with
extremely high or low contrast into a visible range.
This function is designed with an emphasis on getting something that is
reasonable for visualization.
Todo
[x] Move to kwarray and renamed to robust_normalize?
[ ] Support for M-estimators?
Parameters:
imdata (ndarray) – raw intensity data
return_info (bool) – if True, return information about the chosen normalization
heuristic.
params (str | dict) – Can contain keys, low, high, or mid, scaling, extrema
e.g. {‘low’: 0.1, ‘mid’: 0.8, ‘high’: 0.9, ‘scaling’: ‘sigmoid’}
See documentation in find_robust_normalizers().
axis (None | int) – The axis to normalize over, if unspecified, normalize jointly
nodata (None | int) – A value representing nodata to leave unchanged during
normalization, for example 0
dtype (type) – can be float32 or float64
mask (ndarray | None) – A mask indicating what pixels are valid and what pixels should be
considered nodata. Mutually exclusive with nodata argument.
A mask value of 1 indicates a VALID pixel. A mask value of 0
indicates an INVALID pixel.
Note this is the opposite of a masked array.
Returns:
a floating point array with values between 0 and 1.
if return_info is specified, also returns extra data
Finds a feasible solution to the minimum weight maximum value set cover.
The quality and runtime of the solution will depend on the backend
algorithm selected.
Parameters:
candidate_sets_dict (Dict[KT, List[VT]]) – a dictionary where keys are the candidate set ids and each value is
a candidate cover set.
items (Optional[VT]) – the set of all items to be covered,
if not specified, it is infered from the candidate cover sets
set_weights (Optional[Dict[KT, float]]) – maps candidate set ids to a cost for using this candidate cover in
the solution. If not specified the weight of each candiate cover
defaults to 1.
item_values (Optional[Dict[VT, float]]) – maps each item to a value we get for returning this item in the
solution. If not specified the value of each item defaults to 1.
max_weight (Optional[float]) – if specified, the total cost of the
returned cover is constrained to be less than this number.
algo (str) – specifies which algorithm to use. Can either be
‘approx’ for the greedy solution or ‘exact’ for the globally
optimal solution. Note the ‘exact’ algorithm solves an
integer-linear-program, which can be very slow and requires
the pulp package to be installed.
Returns:
a subdict of candidate_sets_dict containing the chosen solution.
The difference between this function and
numpy.random.standard_normal() is that we use float32 arrays in the
backend instead of float64. Halving the amount of bits that need to be
manipulated can significantly reduce the execution time, and 32-bit
precision is often good enough.
Parameters:
size (int | Tuple[int, …]) – shape of the returned ndarray
mean (float, default=0) – mean of the normal distribution
std (float, default=1) – standard deviation of the normal distribution
rng (numpy.random.RandomState) – underlying random state
Returns:
normally distributed random numbers with chosen size.
>>> # xdoctest: +REQUIRES(module:scipy)>>> importscipy>>> importscipy.stats>>> pts=1000>>> # Our numbers are normally distributed with high probability>>> rng=np.random.RandomState(28041990)>>> ours_a=standard_normal32(pts,rng=rng)>>> ours_b=standard_normal32(pts,rng=rng)+2>>> ours=np.concatenate((ours_a,ours_b))# numerical stability?>>> p=scipy.stats.normaltest(ours)[1]>>> print('Probability our data is non-normal is: {:.4g}'.format(p))Probability our data is non-normal is: 1.573e-14>>> rng=np.random.RandomState(28041990)>>> theirs_a=rng.standard_normal(pts)>>> theirs_b=rng.standard_normal(pts)+2>>> theirs=np.concatenate((theirs_a,theirs_b))>>> p=scipy.stats.normaltest(theirs)[1]>>> print('Probability their data is non-normal is: {:.4g}'.format(p))Probability their data is non-normal is: 3.272e-11
>>> # Test an even and odd numbers of points>>> assertstandard_normal32(3).shape==(3,)>>> assertstandard_normal32(2).shape==(2,)>>> assertstandard_normal32(1).shape==(1,)>>> assertstandard_normal32(0).shape==(0,)>>> assertstandard_normal32((3,1)).shape==(3,1)>>> assertstandard_normal32((3,0)).shape==(3,0)
Draws float32 samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low,high) (includes low, but excludes high).
Parameters:
low (float) – Lower boundary of the output interval. All values generated will
be greater than or equal to low. Defaults to 0.
high (float) – Upper boundary of the output interval. All values generated will
be less than high. Default to 1.
size (int | Tuple[int, …] | None) – Output shape. If the given shape is, e.g., (m,n,k), then
m*n*k samples are drawn. If size is None (default),
a single value is returned if low and high are both scalars.
Otherwise, np.broadcast(low,high).size samples are drawn.
dtype (type) – either np.float32 or np.float64. Defaults to float32
rng (numpy.random.RandomState) – underlying random state
Returns:
uniformly distributed random numbers with chosen size and dtype
Extended typing NDArray[Literal[size],Literal[dtype]]
Return type:
ndarray
Benchmark
>>> fromtimeritimportTimerit>>> importkwarray>>> size=(300,300,3)>>> fortimerinTimerit(100,bestof=10,label='dtype=np.float32'):>>> rng=kwarray.ensure_rng(0)>>> withtimer:>>> ours=standard_normal(size,rng=rng,dtype=np.float32)>>> # Timed best=4.705 ms, mean=4.75 ± 0.085 ms for dtype=np.float32>>> fortimerinTimerit(100,bestof=10,label='dtype=np.float64'):>>> rng=kwarray.ensure_rng(0)>>> withtimer:>>> theirs=standard_normal(size,rng=rng,dtype=np.float64)>>> # Timed best=9.327 ms, mean=9.794 ± 0.4 ms for rng.np.float64
Draws float32 samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low,high) (includes low, but excludes high).
Parameters:
low (float, default=0.0) – Lower boundary of the output interval. All values generated will
be greater than or equal to low.
high (float, default=1.0) – Upper boundary of the output interval. All values generated will
be less than high.
size (int | Tuple[int, …] | None) – Output shape. If the given shape is, e.g., (m,n,k), then
m*n*k samples are drawn. If size is None (default),
a single value is returned if low and high are both scalars.
Otherwise, np.broadcast(low,high).size samples are drawn.
Returns:
uniformly distributed random numbers with chosen size.