kwarray.util_averages
¶
Currently just defines “stats_dict”, which is a nice way to gather multiple numeric statistics (e.g. max, min, median, mode, arithmetic-mean, geometric-mean, standard-deviation, etc…) about data in an array.
Module Contents¶
Classes¶
Dynamically records per-element array statistics and can summarized them |
Functions¶
|
Describe statistics about an input array |
|
Compute the geometric mean along the specified axis. |
Attributes¶
- kwarray.util_averages.torch¶
- kwarray.util_averages.stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)¶
Describe statistics about an input array
- Parameters
inputs (ArrayLike) – set of values to get statistics of
axis (int) – if
inputs
is ndarray then this specifies the axisnan (bool) – report number of nan items
sum (bool) – report sum of values
extreme (bool) – report min and max values
n_extreme (bool) – report extreme value frequencies
median (bool) – report median
size (bool) – report array size
shape (bool) – report array shape
- Returns
- stats: dictionary of common numpy statistics
(min, max, mean, std, nMin, nMax, shape)
- Return type
- SeeAlso:
scipy.stats.describe
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> from kwarray.util_averages import * # NOQA >>> axis = 0 >>> rng = np.random.RandomState(0) >>> inputs = rng.rand(10, 2).astype(np.float32) >>> stats = stats_dict(inputs, axis=axis, nan=False, median=True) >>> import ubelt as ub # NOQA >>> result = str(ub.repr2(stats, nl=1, precision=4, with_dtype=True)) >>> print(result) { 'mean': np.array([ 0.5206, 0.6425], dtype=np.float32), 'std': np.array([ 0.2854, 0.2517], dtype=np.float32), 'min': np.array([ 0.0202, 0.0871], dtype=np.float32), 'max': np.array([ 0.9637, 0.9256], dtype=np.float32), 'med': np.array([0.5584, 0.6805], dtype=np.float32), 'shape': (10, 2), }
Example
>>> # xdoctest: +IGNORE_WHITESPACE >>> axis = 0 >>> rng = np.random.RandomState(0) >>> inputs = rng.randint(0, 42, size=100).astype(np.float32) >>> inputs[4] = np.nan >>> stats = stats_dict(inputs, axis=axis, nan=True) >>> import ubelt as ub # NOQA >>> result = str(ub.repr2(stats, nl=0, precision=1, strkeys=True)) >>> print(result) {mean: 20.0, std: 13.2, min: 0.0, max: 41.0, num_nan: 1, shape: (100,)}
- kwarray.util_averages._gmean(a, axis=0, dtype=None, clobber=False)¶
Compute the geometric mean along the specified axis.
Modification of the scikit-learn method to be more memory efficient
- Example
>>> rng = np.random.RandomState(0) >>> C, H, W = 8, 32, 32 >>> axis = 0 >>> a = [rng.rand(C, H, W).astype(np.float16), >>> rng.rand(C, H, W).astype(np.float16)]
- class kwarray.util_averages.RunningStats(run)¶
Bases:
ubelt.NiceRepr
Dynamically records per-element array statistics and can summarized them per-element, across channels, or globally.
Todo
[ ] This may need a few API tweaks and good documentation
Example
>>> import kwarray >>> run = kwarray.RunningStats() >>> ch1 = np.array([[0, 1], [3, 4]]) >>> ch2 = np.zeros((2, 2)) >>> img = np.dstack([ch1, ch2]) >>> run.update(np.dstack([ch1, ch2])) >>> run.update(np.dstack([ch1 + 1, ch2])) >>> run.update(np.dstack([ch1 + 2, ch2])) >>> # No marginalization >>> print('current-ave = ' + ub.repr2(run.summarize(axis=ub.NoParam), nl=2, precision=3)) >>> # Average over channels (keeps spatial dims separate) >>> print('chann-ave(k=1) = ' + ub.repr2(run.summarize(axis=0), nl=2, precision=3)) >>> print('chann-ave(k=0) = ' + ub.repr2(run.summarize(axis=0, keepdims=0), nl=2, precision=3)) >>> # Average over spatial dims (keeps channels separate) >>> print('spatial-ave(k=1) = ' + ub.repr2(run.summarize(axis=(1, 2)), nl=2, precision=3)) >>> print('spatial-ave(k=0) = ' + ub.repr2(run.summarize(axis=(1, 2), keepdims=0), nl=2, precision=3)) >>> # Average over all dims >>> print('alldim-ave(k=1) = ' + ub.repr2(run.summarize(axis=None), nl=2, precision=3)) >>> print('alldim-ave(k=0) = ' + ub.repr2(run.summarize(axis=None, keepdims=0), nl=2, precision=3))
- __nice__(self)¶
- property shape(run)¶
- update(run, data, weights=1)¶
Updates statistics across all data dimensions on a per-element basis
Example
>>> import kwarray >>> data = np.full((7, 5), fill_value=1.3) >>> weights = np.ones((7, 5), dtype=np.float32) >>> run = kwarray.RunningStats() >>> run.update(data, weights=1) >>> run.update(data, weights=weights) >>> rng = np.random >>> weights[rng.rand(*weights.shape) > 0.5] = 0 >>> run.update(data, weights=weights)
- _sumsq_std(run, total, squares, n)¶
Sum of squares method to compute standard deviation
- summarize(run, axis=None, keepdims=True)¶
Compute summary statistics across a one or more dimension
- Parameters
axis (int | List[int] | None | ub.NoParam) – axis or axes to summarize over, if None, all axes are summarized. if ub.NoParam, no axes are summarized the current result is
returned.
keepdims (bool, default=True) – if False removes the dimensions that are summarized over
- Returns
containing minimum, maximum, mean, std, etc..
- Return type
Dict
- current(run)¶
Returns current staticis on a per-element basis (not summarized over any axis)
Todo
- [X] I want this method and summarize to be unified somehow.
I don’t know how to paramatarize it because axis=None usually means summarize over everything, and I need to way to encode, summarize over nothing but the “sequence” dimension (which was given incrementally by the update function), which is what this function does.