kwarray.util_averages

Currently just defines “stats_dict”, which is a nice way to gather multiple numeric statistics (e.g. max, min, median, mode, arithmetic-mean, geometric-mean, standard-deviation, etc…) about data in an array.

Module Contents

Classes

RunningStats

Dynamically records per-element array statistics and can summarized them

Functions

stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)

Describe statistics about an input array

_gmean(a, axis=0, dtype=None, clobber=False)

Compute the geometric mean along the specified axis.

Attributes

torch

kwarray.util_averages.torch
kwarray.util_averages.stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)

Describe statistics about an input array

Parameters
  • inputs (ArrayLike) – set of values to get statistics of

  • axis (int) – if inputs is ndarray then this specifies the axis

  • nan (bool) – report number of nan items

  • sum (bool) – report sum of values

  • extreme (bool) – report min and max values

  • n_extreme (bool) – report extreme value frequencies

  • median (bool) – report median

  • size (bool) – report array size

  • shape (bool) – report array shape

Returns

stats: dictionary of common numpy statistics

(min, max, mean, std, nMin, nMax, shape)

Return type

collections.OrderedDict

SeeAlso:

scipy.stats.describe

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> from kwarray.util_averages import *  # NOQA
>>> axis = 0
>>> rng = np.random.RandomState(0)
>>> inputs = rng.rand(10, 2).astype(np.float32)
>>> stats = stats_dict(inputs, axis=axis, nan=False, median=True)
>>> import ubelt as ub  # NOQA
>>> result = str(ub.repr2(stats, nl=1, precision=4, with_dtype=True))
>>> print(result)
{
    'mean': np.array([ 0.5206,  0.6425], dtype=np.float32),
    'std': np.array([ 0.2854,  0.2517], dtype=np.float32),
    'min': np.array([ 0.0202,  0.0871], dtype=np.float32),
    'max': np.array([ 0.9637,  0.9256], dtype=np.float32),
    'med': np.array([0.5584, 0.6805], dtype=np.float32),
    'shape': (10, 2),
}

Example

>>> # xdoctest: +IGNORE_WHITESPACE
>>> axis = 0
>>> rng = np.random.RandomState(0)
>>> inputs = rng.randint(0, 42, size=100).astype(np.float32)
>>> inputs[4] = np.nan
>>> stats = stats_dict(inputs, axis=axis, nan=True)
>>> import ubelt as ub  # NOQA
>>> result = str(ub.repr2(stats, nl=0, precision=1, strkeys=True))
>>> print(result)
{mean: 20.0, std: 13.2, min: 0.0, max: 41.0, num_nan: 1, shape: (100,)}
kwarray.util_averages._gmean(a, axis=0, dtype=None, clobber=False)

Compute the geometric mean along the specified axis.

Modification of the scikit-learn method to be more memory efficient

Example
>>> rng = np.random.RandomState(0)
>>> C, H, W = 8, 32, 32
>>> axis = 0
>>> a = [rng.rand(C, H, W).astype(np.float16),
>>>      rng.rand(C, H, W).astype(np.float16)]
class kwarray.util_averages.RunningStats(run)

Bases: ubelt.NiceRepr

Dynamically records per-element array statistics and can summarized them per-element, across channels, or globally.

Todo

  • [ ] This may need a few API tweaks and good documentation

Example

>>> import kwarray
>>> run = kwarray.RunningStats()
>>> ch1 = np.array([[0, 1], [3, 4]])
>>> ch2 = np.zeros((2, 2))
>>> img = np.dstack([ch1, ch2])
>>> run.update(np.dstack([ch1, ch2]))
>>> run.update(np.dstack([ch1 + 1, ch2]))
>>> run.update(np.dstack([ch1 + 2, ch2]))
>>> # No marginalization
>>> print('current-ave = ' + ub.repr2(run.summarize(axis=ub.NoParam), nl=2, precision=3))
>>> # Average over channels (keeps spatial dims separate)
>>> print('chann-ave(k=1) = ' + ub.repr2(run.summarize(axis=0), nl=2, precision=3))
>>> print('chann-ave(k=0) = ' + ub.repr2(run.summarize(axis=0, keepdims=0), nl=2, precision=3))
>>> # Average over spatial dims (keeps channels separate)
>>> print('spatial-ave(k=1) = ' + ub.repr2(run.summarize(axis=(1, 2)), nl=2, precision=3))
>>> print('spatial-ave(k=0) = ' + ub.repr2(run.summarize(axis=(1, 2), keepdims=0), nl=2, precision=3))
>>> # Average over all dims
>>> print('alldim-ave(k=1) = ' + ub.repr2(run.summarize(axis=None), nl=2, precision=3))
>>> print('alldim-ave(k=0) = ' + ub.repr2(run.summarize(axis=None, keepdims=0), nl=2, precision=3))
__nice__(self)
property shape(run)
update(run, data, weights=1)

Updates statistics across all data dimensions on a per-element basis

Example

>>> import kwarray
>>> data = np.full((7, 5), fill_value=1.3)
>>> weights = np.ones((7, 5), dtype=np.float32)
>>> run = kwarray.RunningStats()
>>> run.update(data, weights=1)
>>> run.update(data, weights=weights)
>>> rng = np.random
>>> weights[rng.rand(*weights.shape) > 0.5] = 0
>>> run.update(data, weights=weights)
_sumsq_std(run, total, squares, n)

Sum of squares method to compute standard deviation

summarize(run, axis=None, keepdims=True)

Compute summary statistics across a one or more dimension

Parameters
  • axis (int | List[int] | None | ub.NoParam) – axis or axes to summarize over, if None, all axes are summarized. if ub.NoParam, no axes are summarized the current result is

    returned.

  • keepdims (bool, default=True) – if False removes the dimensions that are summarized over

Returns

containing minimum, maximum, mean, std, etc..

Return type

Dict

current(run)

Returns current staticis on a per-element basis (not summarized over any axis)

Todo

  • [X] I want this method and summarize to be unified somehow.

    I don’t know how to paramatarize it because axis=None usually means summarize over everything, and I need to way to encode, summarize over nothing but the “sequence” dimension (which was given incrementally by the update function), which is what this function does.