`kwarray.util_averages`¶

Currently just defines “stats_dict”, which is a nice way to gather multiple numeric statistics (e.g. max, min, median, mode, arithmetic-mean, geometric-mean, standard-deviation, etc…) about data in an array.

Module Contents¶

Classes¶

RunningStats

Dynamically records per-element array statistics and can summarized them

Functions¶

`stats_dict`(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)	Describe statistics about an input array
`_gmean`(a, axis=0, dtype=None, clobber=False)	Compute the geometric mean along the specified axis.

Attributes¶

torch

kwarray.util_averages.torch¶

kwarray.util_averages.stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)¶

Describe statistics about an input array

Parameters

inputs (ArrayLike) – set of values to get statistics of
axis (int) – if inputs is ndarray then this specifies the axis
nan (bool) – report number of nan items
sum (bool) – report sum of values
extreme (bool) – report min and max values
n_extreme (bool) – report extreme value frequencies
median (bool) – report median
size (bool) – report array size
shape (bool) – report array shape

Returns

stats: dictionary of common numpy statistics: (min, max, mean, std, nMin, nMax, shape)

Return type

collections.OrderedDict

SeeAlso:: scipy.stats.describe

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> from kwarray.util_averages import *  # NOQA
>>> axis = 0
>>> rng = np.random.RandomState(0)
>>> inputs = rng.rand(10, 2).astype(np.float32)
>>> stats = stats_dict(inputs, axis=axis, nan=False, median=True)
>>> import ubelt as ub  # NOQA
>>> result = str(ub.repr2(stats, nl=1, precision=4, with_dtype=True))
>>> print(result)
{
    'mean': np.array([ 0.5206,  0.6425], dtype=np.float32),
    'std': np.array([ 0.2854,  0.2517], dtype=np.float32),
    'min': np.array([ 0.0202,  0.0871], dtype=np.float32),
    'max': np.array([ 0.9637,  0.9256], dtype=np.float32),
    'med': np.array([0.5584, 0.6805], dtype=np.float32),
    'shape': (10, 2),
}

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> axis = 0
>>> rng = np.random.RandomState(0)
>>> inputs = rng.randint(0, 42, size=100).astype(np.float32)
>>> inputs[4] = np.nan
>>> stats = stats_dict(inputs, axis=axis, nan=True)
>>> import ubelt as ub  # NOQA
>>> result = str(ub.repr2(stats, nl=0, precision=1, strkeys=True))
>>> print(result)
{mean: 20.0, std: 13.2, min: 0.0, max: 41.0, num_nan: 1, shape: (100,)}

kwarray.util_averages._gmean(a, axis=0, dtype=None, clobber=False)¶

Compute the geometric mean along the specified axis.

Modification of the scikit-learn method to be more memory efficient

Example

>>> rng = np.random.RandomState(0)
>>> C, H, W = 8, 32, 32
>>> axis = 0
>>> a = [rng.rand(C, H, W).astype(np.float16),
>>>      rng.rand(C, H, W).astype(np.float16)]

class kwarray.util_averages.RunningStats(run)¶

Bases: ubelt.NiceRepr

Dynamically records per-element array statistics and can summarized them per-element, across channels, or globally.

Todo

[ ] This may need a few API tweaks and good documentation

Example

>>> import kwarray
>>> run = kwarray.RunningStats()
>>> ch1 = np.array([[0, 1], [3, 4]])
>>> ch2 = np.zeros((2, 2))
>>> img = np.dstack([ch1, ch2])
>>> run.update(np.dstack([ch1, ch2]))
>>> run.update(np.dstack([ch1 + 1, ch2]))
>>> run.update(np.dstack([ch1 + 2, ch2]))
>>> # No marginalization
>>> print('current-ave = ' + ub.repr2(run.summarize(axis=ub.NoParam), nl=2, precision=3))
>>> # Average over channels (keeps spatial dims separate)
>>> print('chann-ave(k=1) = ' + ub.repr2(run.summarize(axis=0), nl=2, precision=3))
>>> print('chann-ave(k=0) = ' + ub.repr2(run.summarize(axis=0, keepdims=0), nl=2, precision=3))
>>> # Average over spatial dims (keeps channels separate)
>>> print('spatial-ave(k=1) = ' + ub.repr2(run.summarize(axis=(1, 2)), nl=2, precision=3))
>>> print('spatial-ave(k=0) = ' + ub.repr2(run.summarize(axis=(1, 2), keepdims=0), nl=2, precision=3))
>>> # Average over all dims
>>> print('alldim-ave(k=1) = ' + ub.repr2(run.summarize(axis=None), nl=2, precision=3))
>>> print('alldim-ave(k=0) = ' + ub.repr2(run.summarize(axis=None, keepdims=0), nl=2, precision=3))

__nice__(self)¶

property shape(run)¶

update(run, data, weights=1)¶

Updates statistics across all data dimensions on a per-element basis

Example

>>> import kwarray
>>> data = np.full((7, 5), fill_value=1.3)
>>> weights = np.ones((7, 5), dtype=np.float32)
>>> run = kwarray.RunningStats()
>>> run.update(data, weights=1)
>>> run.update(data, weights=weights)
>>> rng = np.random
>>> weights[rng.rand(*weights.shape) > 0.5] = 0
>>> run.update(data, weights=weights)

_sumsq_std(run, total, squares, n)¶: Sum of squares method to compute standard deviation

summarize(run, axis=None, keepdims=True)¶

Compute summary statistics across a one or more dimension

Parameters

axis (int | List[int] | None | ub.NoParam) – axis or axes to summarize over, if None, all axes are summarized. if ub.NoParam, no axes are summarized the current result is

returned.
keepdims (bool, default=True) – if False removes the dimensions that are summarized over

Returns

containing minimum, maximum, mean, std, etc..

Return type

Dict

current(run)¶

Returns current staticis on a per-element basis (not summarized over any axis)

Todo

[X] I want this method and summarize to be unified somehow.
I don’t know how to paramatarize it because axis=None usually means summarize over everything, and I need to way to encode, summarize over nothing but the “sequence” dimension (which was given incrementally by the update function), which is what this function does.

kwarray.util_averages¶

Module Contents¶

Classes¶

Functions¶

Attributes¶

`kwarray.util_averages`¶