:py:mod:`kwarray.util_averages` =============================== .. py:module:: kwarray.util_averages .. autoapi-nested-parse:: Currently just defines "stats_dict", which is a nice way to gather multiple numeric statistics (e.g. max, min, median, mode, arithmetic-mean, geometric-mean, standard-deviation, etc...) about data in an array. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: kwarray.util_averages.RunningStats Functions ~~~~~~~~~ .. autoapisummary:: kwarray.util_averages.stats_dict kwarray.util_averages._gmean Attributes ~~~~~~~~~~ .. autoapisummary:: kwarray.util_averages.torch .. py:data:: torch .. py:function:: stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False) Describe statistics about an input array :Parameters: * **inputs** (*ArrayLike*) -- set of values to get statistics of * **axis** (*int*) -- if ``inputs`` is ndarray then this specifies the axis * **nan** (*bool*) -- report number of nan items * **sum** (*bool*) -- report sum of values * **extreme** (*bool*) -- report min and max values * **n_extreme** (*bool*) -- report extreme value frequencies * **median** (*bool*) -- report median * **size** (*bool*) -- report array size * **shape** (*bool*) -- report array shape :returns: stats: dictionary of common numpy statistics (min, max, mean, std, nMin, nMax, shape) :rtype: collections.OrderedDict SeeAlso: scipy.stats.describe .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> from kwarray.util_averages import * # NOQA >>> axis = 0 >>> rng = np.random.RandomState(0) >>> inputs = rng.rand(10, 2).astype(np.float32) >>> stats = stats_dict(inputs, axis=axis, nan=False, median=True) >>> import ubelt as ub # NOQA >>> result = str(ub.repr2(stats, nl=1, precision=4, with_dtype=True)) >>> print(result) { 'mean': np.array([ 0.5206, 0.6425], dtype=np.float32), 'std': np.array([ 0.2854, 0.2517], dtype=np.float32), 'min': np.array([ 0.0202, 0.0871], dtype=np.float32), 'max': np.array([ 0.9637, 0.9256], dtype=np.float32), 'med': np.array([0.5584, 0.6805], dtype=np.float32), 'shape': (10, 2), } .. rubric:: Example >>> # xdoctest: +IGNORE_WHITESPACE >>> axis = 0 >>> rng = np.random.RandomState(0) >>> inputs = rng.randint(0, 42, size=100).astype(np.float32) >>> inputs[4] = np.nan >>> stats = stats_dict(inputs, axis=axis, nan=True) >>> import ubelt as ub # NOQA >>> result = str(ub.repr2(stats, nl=0, precision=1, strkeys=True)) >>> print(result) {mean: 20.0, std: 13.2, min: 0.0, max: 41.0, num_nan: 1, shape: (100,)} .. py:function:: _gmean(a, axis=0, dtype=None, clobber=False) Compute the geometric mean along the specified axis. Modification of the scikit-learn method to be more memory efficient Example >>> rng = np.random.RandomState(0) >>> C, H, W = 8, 32, 32 >>> axis = 0 >>> a = [rng.rand(C, H, W).astype(np.float16), >>> rng.rand(C, H, W).astype(np.float16)] .. py:class:: RunningStats(run) Bases: :py:obj:`ubelt.NiceRepr` Dynamically records per-element array statistics and can summarized them per-element, across channels, or globally. .. todo:: - [ ] This may need a few API tweaks and good documentation .. rubric:: Example >>> import kwarray >>> run = kwarray.RunningStats() >>> ch1 = np.array([[0, 1], [3, 4]]) >>> ch2 = np.zeros((2, 2)) >>> img = np.dstack([ch1, ch2]) >>> run.update(np.dstack([ch1, ch2])) >>> run.update(np.dstack([ch1 + 1, ch2])) >>> run.update(np.dstack([ch1 + 2, ch2])) >>> # No marginalization >>> print('current-ave = ' + ub.repr2(run.summarize(axis=ub.NoParam), nl=2, precision=3)) >>> # Average over channels (keeps spatial dims separate) >>> print('chann-ave(k=1) = ' + ub.repr2(run.summarize(axis=0), nl=2, precision=3)) >>> print('chann-ave(k=0) = ' + ub.repr2(run.summarize(axis=0, keepdims=0), nl=2, precision=3)) >>> # Average over spatial dims (keeps channels separate) >>> print('spatial-ave(k=1) = ' + ub.repr2(run.summarize(axis=(1, 2)), nl=2, precision=3)) >>> print('spatial-ave(k=0) = ' + ub.repr2(run.summarize(axis=(1, 2), keepdims=0), nl=2, precision=3)) >>> # Average over all dims >>> print('alldim-ave(k=1) = ' + ub.repr2(run.summarize(axis=None), nl=2, precision=3)) >>> print('alldim-ave(k=0) = ' + ub.repr2(run.summarize(axis=None, keepdims=0), nl=2, precision=3)) .. py:method:: __nice__(self) .. py:method:: shape(run) :property: .. py:method:: update(run, data, weights=1) Updates statistics across all data dimensions on a per-element basis .. rubric:: Example >>> import kwarray >>> data = np.full((7, 5), fill_value=1.3) >>> weights = np.ones((7, 5), dtype=np.float32) >>> run = kwarray.RunningStats() >>> run.update(data, weights=1) >>> run.update(data, weights=weights) >>> rng = np.random >>> weights[rng.rand(*weights.shape) > 0.5] = 0 >>> run.update(data, weights=weights) .. py:method:: _sumsq_std(run, total, squares, n) Sum of squares method to compute standard deviation .. py:method:: summarize(run, axis=None, keepdims=True) Compute summary statistics across a one or more dimension :Parameters: * **axis** (*int | List[int] | None | ub.NoParam*) -- axis or axes to summarize over, if None, all axes are summarized. if ub.NoParam, no axes are summarized the current result is returned. * **keepdims** (*bool, default=True*) -- if False removes the dimensions that are summarized over :returns: containing minimum, maximum, mean, std, etc.. :rtype: Dict .. py:method:: current(run) Returns current staticis on a per-element basis (not summarized over any axis) .. todo:: - [X] I want this method and summarize to be unified somehow. I don't know how to paramatarize it because axis=None usually means summarize over everything, and I need to way to encode, summarize over nothing but the "sequence" dimension (which was given incrementally by the update function), which is what this function does.