:py:mod:`kwarray.util_averages`
===============================

.. py:module:: kwarray.util_averages

.. autoapi-nested-parse::

   Currently just defines "stats_dict", which is a nice way to gather multiple
   numeric statistics (e.g. max, min, median, mode, arithmetic-mean,
   geometric-mean, standard-deviation, etc...) about data in an array.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   kwarray.util_averages.RunningStats


Functions
~~~~~~~~~

.. autoapisummary::

   kwarray.util_averages.stats_dict
   kwarray.util_averages._gmean


Attributes
~~~~~~~~~~

.. autoapisummary::

   kwarray.util_averages.torch


.. py:data:: torch
   

.. py:function:: stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)

   Describe statistics about an input array

   :Parameters: * **inputs** (*ArrayLike*) -- set of values to get statistics of
                * **axis** (*int*) -- if ``inputs`` is ndarray then this specifies the axis
                * **nan** (*bool*) -- report number of nan items
                * **sum** (*bool*) -- report sum of values
                * **extreme** (*bool*) -- report min and max values
                * **n_extreme** (*bool*) -- report extreme value frequencies
                * **median** (*bool*) -- report median
                * **size** (*bool*) -- report array size
                * **shape** (*bool*) -- report array shape

   :returns:

             stats: dictionary of common numpy statistics
                 (min, max, mean, std, nMin, nMax, shape)
   :rtype: collections.OrderedDict

   SeeAlso:
       scipy.stats.describe

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> from kwarray.util_averages import *  # NOQA
   >>> axis = 0
   >>> rng = np.random.RandomState(0)
   >>> inputs = rng.rand(10, 2).astype(np.float32)
   >>> stats = stats_dict(inputs, axis=axis, nan=False, median=True)
   >>> import ubelt as ub  # NOQA
   >>> result = str(ub.repr2(stats, nl=1, precision=4, with_dtype=True))
   >>> print(result)
   {
       'mean': np.array([ 0.5206,  0.6425], dtype=np.float32),
       'std': np.array([ 0.2854,  0.2517], dtype=np.float32),
       'min': np.array([ 0.0202,  0.0871], dtype=np.float32),
       'max': np.array([ 0.9637,  0.9256], dtype=np.float32),
       'med': np.array([0.5584, 0.6805], dtype=np.float32),
       'shape': (10, 2),
   }

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> axis = 0
   >>> rng = np.random.RandomState(0)
   >>> inputs = rng.randint(0, 42, size=100).astype(np.float32)
   >>> inputs[4] = np.nan
   >>> stats = stats_dict(inputs, axis=axis, nan=True)
   >>> import ubelt as ub  # NOQA
   >>> result = str(ub.repr2(stats, nl=0, precision=1, strkeys=True))
   >>> print(result)
   {mean: 20.0, std: 13.2, min: 0.0, max: 41.0, num_nan: 1, shape: (100,)}


.. py:function:: _gmean(a, axis=0, dtype=None, clobber=False)

   Compute the geometric mean along the specified axis.

   Modification of the scikit-learn method to be more memory efficient

   Example
       >>> rng = np.random.RandomState(0)
       >>> C, H, W = 8, 32, 32
       >>> axis = 0
       >>> a = [rng.rand(C, H, W).astype(np.float16),
       >>>      rng.rand(C, H, W).astype(np.float16)]


.. py:class:: RunningStats(run)

   Bases: :py:obj:`ubelt.NiceRepr`

   Dynamically records per-element array statistics and can summarized them
   per-element, across channels, or globally.

   .. todo:: - [ ] This may need a few API tweaks and good documentation

   .. rubric:: Example

   >>> import kwarray
   >>> run = kwarray.RunningStats()
   >>> ch1 = np.array([[0, 1], [3, 4]])
   >>> ch2 = np.zeros((2, 2))
   >>> img = np.dstack([ch1, ch2])
   >>> run.update(np.dstack([ch1, ch2]))
   >>> run.update(np.dstack([ch1 + 1, ch2]))
   >>> run.update(np.dstack([ch1 + 2, ch2]))
   >>> # No marginalization
   >>> print('current-ave = ' + ub.repr2(run.summarize(axis=ub.NoParam), nl=2, precision=3))
   >>> # Average over channels (keeps spatial dims separate)
   >>> print('chann-ave(k=1) = ' + ub.repr2(run.summarize(axis=0), nl=2, precision=3))
   >>> print('chann-ave(k=0) = ' + ub.repr2(run.summarize(axis=0, keepdims=0), nl=2, precision=3))
   >>> # Average over spatial dims (keeps channels separate)
   >>> print('spatial-ave(k=1) = ' + ub.repr2(run.summarize(axis=(1, 2)), nl=2, precision=3))
   >>> print('spatial-ave(k=0) = ' + ub.repr2(run.summarize(axis=(1, 2), keepdims=0), nl=2, precision=3))
   >>> # Average over all dims
   >>> print('alldim-ave(k=1) = ' + ub.repr2(run.summarize(axis=None), nl=2, precision=3))
   >>> print('alldim-ave(k=0) = ' + ub.repr2(run.summarize(axis=None, keepdims=0), nl=2, precision=3))

   .. py:method:: __nice__(self)


   .. py:method:: shape(run)
      :property:


   .. py:method:: update(run, data, weights=1)

      Updates statistics across all data dimensions on a per-element basis

      .. rubric:: Example

      >>> import kwarray
      >>> data = np.full((7, 5), fill_value=1.3)
      >>> weights = np.ones((7, 5), dtype=np.float32)
      >>> run = kwarray.RunningStats()
      >>> run.update(data, weights=1)
      >>> run.update(data, weights=weights)
      >>> rng = np.random
      >>> weights[rng.rand(*weights.shape) > 0.5] = 0
      >>> run.update(data, weights=weights)


   .. py:method:: _sumsq_std(run, total, squares, n)

      Sum of squares method to compute standard deviation


   .. py:method:: summarize(run, axis=None, keepdims=True)

      Compute summary statistics across a one or more dimension

      :Parameters: * **axis** (*int | List[int] | None | ub.NoParam*) -- axis or axes to summarize over,
                     if None, all axes are summarized.
                     if ub.NoParam, no axes are summarized the current result is
                         returned.
                   * **keepdims** (*bool, default=True*) -- if False removes the dimensions that are summarized over

      :returns: containing minimum, maximum, mean, std, etc..
      :rtype: Dict


   .. py:method:: current(run)

      Returns current staticis on a per-element basis
      (not summarized over any axis)

      .. todo::

         - [X] I want this method and summarize to be unified somehow.
             I don't know how to paramatarize it because axis=None usually
             means summarize over everything, and I need to way to encode,
             summarize over nothing but the "sequence" dimension (which was
             given incrementally by the update function), which is what
             this function does.