:py:mod:`kwarray`
=================

.. py:module:: kwarray

.. autoapi-nested-parse::

   The ``kwarray`` module implements a small set of pure-python extensions to
   numpy and torch.


Submodules
----------
.. toctree::
   :titlesonly:
   :maxdepth: 1

   algo_assignment/index.rst
   algo_setcover/index.rst
   arrayapi/index.rst
   dataframe_light/index.rst
   distributions/index.rst
   fast_rand/index.rst
   util_averages/index.rst
   util_groups/index.rst
   util_misc/index.rst
   util_numpy/index.rst
   util_random/index.rst
   util_slices/index.rst
   util_slider/index.rst
   util_torch/index.rst


Package Contents
----------------

Classes
~~~~~~~

.. autoapisummary::

   kwarray.ArrayAPI
   kwarray.DataFrameArray
   kwarray.DataFrameLight
   kwarray.LocLight
   kwarray.RunningStats
   kwarray.FlatIndexer
   kwarray.SlidingWindow
   kwarray.Stitcher


Functions
~~~~~~~~~

.. autoapisummary::

   kwarray.dtype_info
   kwarray.maxvalue_assignment
   kwarray.mincost_assignment
   kwarray.mindist_assignment
   kwarray.setcover
   kwarray.standard_normal
   kwarray.standard_normal32
   kwarray.standard_normal64
   kwarray.uniform
   kwarray.uniform32
   kwarray.stats_dict
   kwarray.apply_grouping
   kwarray.group_consecutive
   kwarray.group_consecutive_indices
   kwarray.group_indices
   kwarray.group_items
   kwarray.arglexmax
   kwarray.argmaxima
   kwarray.argminima
   kwarray.atleast_nd
   kwarray.boolmask
   kwarray.isect_flags
   kwarray.iter_reduce_ufunc
   kwarray.normalize
   kwarray.ensure_rng
   kwarray.random_combinations
   kwarray.random_product
   kwarray.seed_global
   kwarray.shuffle
   kwarray.embed_slice
   kwarray.padded_slice
   kwarray.one_hot_embedding
   kwarray.one_hot_lookup


.. py:class:: ArrayAPI

   Bases: :py:obj:`object`

   Compatability API between torch and numpy.

   The API defines classmethods that work on both Tensors and ndarrays.  As
   such the user can simply use ``kwarray.ArrayAPI.<funcname>`` and it will
   return the expected result for both Tensor and ndarray types.

   However, this is inefficient because it requires us to check the type of
   the input for every API call. Therefore it is recommended that you use the
   :func:`ArrayAPI.coerce` function, which takes as input the data you want to
   operate on. It performs the type check once, and then returns another
   object that defines with an identical API, but specific to the given data
   type. This means that we can ignore type checks on future calls of the
   specific implementation. See examples for more details.


   .. rubric:: Example

   >>> # Use the easy-to-use, but inefficient array api
   >>> # xdoctest: +REQUIRES(module:torch)
   >>> take = ArrayAPI.take
   >>> np_data = np.arange(0, 143).reshape(11, 13)
   >>> pt_data = torch.LongTensor(np_data)
   >>> indices = [1, 3, 5, 7, 11, 13, 17, 21]
   >>> idxs0 = [1, 3, 5, 7]
   >>> idxs1 = [1, 3, 5, 7, 11]
   >>> assert np.allclose(take(np_data, indices), take(pt_data, indices))
   >>> assert np.allclose(take(np_data, idxs0, 0), take(pt_data, idxs0, 0))
   >>> assert np.allclose(take(np_data, idxs1, 1), take(pt_data, idxs1, 1))

   .. rubric:: Example

   >>> # Use the easy-to-use, but inefficient array api
   >>> # xdoctest: +REQUIRES(module:torch)
   >>> compress = ArrayAPI.compress
   >>> np_data = np.arange(0, 143).reshape(11, 13)
   >>> pt_data = torch.LongTensor(np_data)
   >>> flags = (np_data % 2 == 0).ravel()
   >>> f0 = (np_data % 2 == 0)[:, 0]
   >>> f1 = (np_data % 2 == 0)[0, :]
   >>> assert np.allclose(compress(np_data, flags), compress(pt_data, flags))
   >>> assert np.allclose(compress(np_data, f0, 0), compress(pt_data, f0, 0))
   >>> assert np.allclose(compress(np_data, f1, 1), compress(pt_data, f1, 1))

   .. rubric:: Example

   >>> # Use ArrayAPI to coerce an identical API that doesnt do type checks
   >>> # xdoctest: +REQUIRES(module:torch)
   >>> import kwarray
   >>> np_data = np.arange(0, 15).reshape(3, 5)
   >>> pt_data = torch.LongTensor(np_data)
   >>> # The new ``impl`` object has the same API as ArrayAPI, but works
   >>> # specifically on torch Tensors.
   >>> impl = kwarray.ArrayAPI.coerce(pt_data)
   >>> flat_data = impl.view(pt_data, -1)
   >>> print('flat_data = {!r}'.format(flat_data))
   flat_data = tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
   >>> # The new ``impl`` object has the same API as ArrayAPI, but works
   >>> # specifically on numpy ndarrays.
   >>> impl = kwarray.ArrayAPI.coerce(np_data)
   >>> flat_data = impl.view(np_data, -1)
   >>> print('flat_data = {!r}'.format(flat_data))
   flat_data = array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

   .. py:attribute:: _torch
      

   .. py:attribute:: _numpy
      

   .. py:attribute:: take
      

   .. py:attribute:: compress
      

   .. py:attribute:: repeat
      

   .. py:attribute:: tile
      

   .. py:attribute:: view
      

   .. py:attribute:: numel
      

   .. py:attribute:: atleast_nd
      

   .. py:attribute:: full_like
      

   .. py:attribute:: ones_like
      

   .. py:attribute:: zeros_like
      

   .. py:attribute:: empty_like
      

   .. py:attribute:: sum
      

   .. py:attribute:: argmax
      

   .. py:attribute:: argsort
      

   .. py:attribute:: max
      

   .. py:attribute:: maximum
      

   .. py:attribute:: minimum
      

   .. py:attribute:: matmul
      

   .. py:attribute:: astype
      

   .. py:attribute:: nonzero
      

   .. py:attribute:: nan_to_num
      

   .. py:attribute:: tensor
      

   .. py:attribute:: numpy
      

   .. py:attribute:: tolist
      

   .. py:attribute:: asarray
      

   .. py:attribute:: asarray
      

   .. py:attribute:: T
      

   .. py:attribute:: transpose
      

   .. py:attribute:: contiguous
      

   .. py:attribute:: pad
      

   .. py:attribute:: dtype_kind
      

   .. py:attribute:: max_argmax
      

   .. py:attribute:: any
      

   .. py:attribute:: all
      

   .. py:attribute:: log2
      

   .. py:attribute:: log
      

   .. py:attribute:: copy
      

   .. py:attribute:: iceil
      

   .. py:attribute:: ifloor
      

   .. py:attribute:: floor
      

   .. py:attribute:: ceil
      

   .. py:attribute:: round
      

   .. py:attribute:: iround
      

   .. py:attribute:: clip
      

   .. py:attribute:: softmax
      

   .. py:method:: impl(data)
      :staticmethod:

      Returns a namespace suitable for operating on the input data type

      :Parameters: **data** (*ndarray | Tensor*) -- data to be operated on


   .. py:method:: coerce(data)
      :staticmethod:

      Coerces some form of inputs into an array api (either numpy or torch).


   .. py:method:: cat(datas, *args, **kwargs)


   .. py:method:: hstack(datas, *args, **kwargs)


   .. py:method:: vstack(datas, *args, **kwargs)


.. py:function:: dtype_info(dtype)

   :Parameters: **dtype** (*type*) -- a numpy, torch, or python numeric data type

   :returns: an iinfo of finfo structure depending on the input type.
   :rtype: struct

   .. rubric:: References

   https://higra.readthedocs.io/en/stable/_modules/higra/hg_utils.html#dtype_info

   .. rubric:: Example

   >>> from kwarray.arrayapi import *  # NOQA
   >>> results = []
   >>> results += [dtype_info(float)]
   >>> results += [dtype_info(int)]
   >>> results += [dtype_info(complex)]
   >>> results += [dtype_info(np.float32)]
   >>> results += [dtype_info(np.int32)]
   >>> results += [dtype_info(np.uint32)]
   >>> if hasattr(np, 'complex256'):
   >>>     results += [dtype_info(np.complex256)]
   >>> if torch is not None:
   >>>     results += [dtype_info(torch.float32)]
   >>>     results += [dtype_info(torch.int64)]
   >>>     results += [dtype_info(torch.complex64)]
   >>> for info in results:
   >>>     print('info = {!r}'.format(info))
   >>> for info in results:
   >>>     print('info.bits = {!r}'.format(info.bits))


.. py:function:: maxvalue_assignment(value)

   Finds the maximum value assignment based on a NxM value matrix. Any pair
   with a non-positive value will not be assigned.

   :Parameters: **value** (*ndarray*) -- NxM matrix, value[i, j] is the value of matching i and j

   :returns:

             tuple containing a list of assignment of rows
                 and columns, and the total value of the assignment.
   :rtype: Tuple[list, float]

   CommandLine:
       xdoctest -m ~/code/kwarray/kwarray/algo_assignment.py maxvalue_assignment

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:scipy)
   >>> # Costs to match item i in set1 with item j in set2.
   >>> value = np.array([
   >>>     [9, 2, 1, 3],
   >>>     [4, 1, 5, 5],
   >>>     [9, 9, 2, 4],
   >>>     [-1, -1, -1, -1],
   >>> ])
   >>> ret = maxvalue_assignment(value)
   >>> # Note, depending on the scipy version the assignment might change
   >>> # but the value should always be the same.
   >>> print('Total value: {}'.format(ret[1]))
   Total value: 23.0
   >>> print('Assignment: {}'.format(ret[0]))  # xdoc: +IGNORE_WANT
   Assignment: [(0, 0), (1, 3), (2, 1)]

   >>> ret = maxvalue_assignment(np.array([[np.inf]]))
   >>> print('Assignment: {}'.format(ret[0]))
   >>> print('Total value: {}'.format(ret[1]))
   Assignment: [(0, 0)]
   Total value: inf

   >>> ret = maxvalue_assignment(np.array([[0]]))
   >>> print('Assignment: {}'.format(ret[0]))
   >>> print('Total value: {}'.format(ret[1]))
   Assignment: []
   Total value: 0


.. py:function:: mincost_assignment(cost)

   Finds the minimum cost assignment based on a NxM cost matrix, subject to
   the constraint that each row can match at most one column and each column
   can match at most one row. Any pair with a cost of infinity will not be
   assigned.

   :Parameters: **cost** (*ndarray*) -- NxM matrix, cost[i, j] is the cost to match i and j

   :returns:

             tuple containing a list of assignment of rows
                 and columns, and the total cost of the assignment.
   :rtype: Tuple[list, float]

   CommandLine:
       xdoctest -m ~/code/kwarray/kwarray/algo_assignment.py mincost_assignment


   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:scipy)
   >>> # Costs to match item i in set1 with item j in set2.
   >>> cost = np.array([
   >>>     [9, 2, 1, 9],
   >>>     [4, 1, 5, 5],
   >>>     [9, 9, 2, 4],
   >>> ])
   >>> ret = mincost_assignment(cost)
   >>> print('Assignment: {}'.format(ret[0]))
   >>> print('Total cost: {}'.format(ret[1]))
   Assignment: [(0, 2), (1, 1), (2, 3)]
   Total cost: 6

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:scipy)
   >>> cost = np.array([
   >>>     [0, 0, 0, 0],
   >>>     [4, 1, 5, -np.inf],
   >>>     [9, 9, np.inf, 4],
   >>>     [9, -2, np.inf, 4],
   >>> ])
   >>> ret = mincost_assignment(cost)
   >>> print('Assignment: {}'.format(ret[0]))
   >>> print('Total cost: {}'.format(ret[1]))
   Assignment: [(0, 2), (1, 3), (2, 0), (3, 1)]
   Total cost: -inf

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:scipy)
   >>> cost = np.array([
   >>>     [0, 0, 0, 0],
   >>>     [4, 1, 5, -3],
   >>>     [1, 9, np.inf, 0.1],
   >>>     [np.inf, np.inf, np.inf, 100],
   >>> ])
   >>> ret = mincost_assignment(cost)
   >>> print('Assignment: {}'.format(ret[0]))
   >>> print('Total cost: {}'.format(ret[1]))
   Assignment: [(0, 2), (1, 1), (2, 0), (3, 3)]
   Total cost: 102.0


.. py:function:: mindist_assignment(vecs1, vecs2, p=2)

   Finds minimum cost assignment between two sets of D dimensional vectors.

   :Parameters: * **vecs1** (*np.ndarray*) -- NxD array of vectors representing items in vecs1
                * **vecs2** (*np.ndarray*) -- MxD array of vectors representing items in vecs2
                * **p** (*float*) -- L-p norm to use. Default is 2 (aka Eucliedean)

   :returns:

             tuple containing assignments of rows in vecs1 to
                 rows in vecs2, and the total distance between assigned pairs.
   :rtype: Tuple[list, float]

   .. rubric:: Notes

   Thin wrapper around mincost_assignment

   CommandLine:
       xdoctest -m ~/code/kwarray/kwarray/algo_assignment.py mindist_assignment

   CommandLine:
       xdoctest -m ~/code/kwarray/kwarray/algo_assignment.py mindist_assignment

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:scipy)
   >>> # Rows are detections in img1, cols are detections in img2
   >>> rng = np.random.RandomState(43)
   >>> vecs1 = rng.randint(0, 10, (5, 2))
   >>> vecs2 = rng.randint(0, 10, (7, 2))
   >>> ret = mindist_assignment(vecs1, vecs2)
   >>> print('Total error: {:.4f}'.format(ret[1]))
   Total error: 8.2361
   >>> print('Assignment: {}'.format(ret[0]))  # xdoc: +IGNORE_WANT
   Assignment: [(0, 0), (1, 3), (2, 5), (3, 2), (4, 6)]


.. py:function:: setcover(candidate_sets_dict, items=None, set_weights=None, item_values=None, max_weight=None, algo='approx')

   Finds a feasible solution to the minimum weight maximum value set cover.
   The quality and runtime of the solution will depend on the backend
   algorithm selected.

   :Parameters: * **candidate_sets_dict** (*Dict[Hashable, List[Hashable]]*) -- a dictionary where keys are the candidate set ids and each value is
                  a candidate cover set.
                * **items** (*Hashable, optional*) -- the set of all items to be covered,
                  if not specified, it is infered from the candidate cover sets
                * **set_weights** (*Dict, optional*) -- maps candidate set ids to a cost
                  for using this candidate cover in the solution. If not specified
                  the weight of each candiate cover defaults to 1.
                * **item_values** (*Dict, optional*) -- maps each item to a value we get for
                  returning this item in the solution. If not specified the value
                  of each item defaults to 1.
                * **max_weight** (*float*) -- if specified, the total cost of the returned cover
                  is constrained to be less than this number.
                * **algo** (*str*) -- specifies which algorithm to use. Can either be
                  'approx' for the greedy solution or 'exact' for the globally
                  optimal solution. Note the 'exact' algorithm solves an
                  integer-linear-program, which can be very slow and requires
                  the `pulp` package to be installed.

   :returns: a subdict of candidate_sets_dict containing the chosen solution.
   :rtype: Dict

   .. rubric:: Example

   >>> candidate_sets_dict = {
   >>>     'a': [1, 2, 3, 8, 9, 0],
   >>>     'b': [1, 2, 3, 4, 5],
   >>>     'c': [4, 5, 7],
   >>>     'd': [5, 6, 7],
   >>>     'e': [6, 7, 8, 9, 0],
   >>> }
   >>> greedy_soln = setcover(candidate_sets_dict, algo='greedy')
   >>> print('greedy_soln = {}'.format(ub.repr2(greedy_soln, nl=0)))
   greedy_soln = {'a': [1, 2, 3, 8, 9, 0], 'c': [4, 5, 7], 'd': [5, 6, 7]}
   >>> # xdoc: +REQUIRES(module:pulp)
   >>> exact_soln = setcover(candidate_sets_dict, algo='exact')
   >>> print('exact_soln = {}'.format(ub.repr2(exact_soln, nl=0)))
   exact_soln = {'b': [1, 2, 3, 4, 5], 'e': [6, 7, 8, 9, 0]}


.. py:class:: DataFrameArray(data=None, columns=None)

   Bases: :py:obj:`DataFrameLight`

   DataFrameLight assumes the backend is a Dict[list]
   DataFrameArray assumes the backend is a Dict[ndarray]

   Take and compress are much faster, but extend and union are slower

   .. py:method:: __normalize__(self)

      Try to convert input data to Dict[ndarray]


   .. py:method:: extend(self, other)

      Extend ``self`` inplace using another dataframe array

      :Parameters: **other** (*DataFrameLight | dict[str, Sequence]*) -- values to concat to end of this object

      .. note:: Not part of the pandas API

      .. rubric:: Example

      >>> self = DataFrameLight(columns=['foo', 'bar'])
      >>> other = {'foo': [0], 'bar': [1]}
      >>> self.extend(other)
      >>> assert len(self) == 1


   .. py:method:: compress(self, flags, inplace=False)

      NOTE: NOT A PART OF THE PANDAS API


   .. py:method:: take(self, indices, inplace=False)

      Return the elements in the given *positional* indices along an axis.

      :Parameters: **inplace** (*bool*) -- NOT PART OF PANDAS API

      .. rubric:: Notes

      assumes axis=0

      .. rubric:: Example

      >>> df_light = DataFrameLight._demodata(num=7)
      >>> indices = [0, 2, 3]
      >>> sub1 = df_light.take(indices)
      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_heavy = df_light.pandas()
      >>> sub2 = df_heavy.take(indices)
      >>> assert np.all(sub1 == sub2)


.. py:class:: DataFrameLight(data=None, columns=None)

   Bases: :py:obj:`ubelt.NiceRepr`

   Implements a subset of the pandas.DataFrame API

   The API is restricted to facilitate speed tradeoffs

   .. rubric:: Notes

   Assumes underlying data is Dict[list|ndarray]. If the data is known
   to be a Dict[ndarray] use DataFrameArray instead, which has faster
   implementations for some operations.

   .. rubric:: Notes

   pandas.DataFrame is slow. DataFrameLight is faster.
   It is a tad more restrictive though.

   .. rubric:: Example

   >>> self = DataFrameLight({})
   >>> print('self = {!r}'.format(self))
   >>> self = DataFrameLight({'a': [0, 1, 2], 'b': [2, 3, 4]})
   >>> print('self = {!r}'.format(self))
   >>> item = self.iloc[0]
   >>> print('item = {!r}'.format(item))

   Benchmark:
       >>> # BENCHMARK
       >>> # xdoc: +REQUIRES(--bench)
       >>> from kwarray.dataframe_light import *  # NOQA
       >>> import ubelt as ub
       >>> NUM = 1000
       >>> print('NUM = {!r}'.format(NUM))
       >>> # to_dict conversions
       >>> print('==============')
       >>> print('====== to_dict conversions =====')
       >>> _keys = ['list', 'dict', 'series', 'split', 'records', 'index']
       >>> results = []
       >>> df = DataFrameLight._demodata(num=NUM).pandas()
       >>> ti = ub.Timerit(verbose=False, unit='ms')
       >>> for key in _keys:
       >>>     result = ti.reset(key).call(lambda: df.to_dict(orient=key))
       >>>     results.append((result.mean(), result.report()))
       >>> key = 'series+numpy'
       >>> result = ti.reset(key).call(lambda: {k: v.values for k, v in df.to_dict(orient='series').items()})
       >>> results.append((result.mean(), result.report()))
       >>> print('\n'.join([t[1] for t in sorted(results)]))
       >>> print('==============')
       >>> print('====== DFLight Conversions =======')
       >>> ti = ub.Timerit(verbose=True, unit='ms')
       >>> key = 'self.pandas'
       >>> self = DataFrameLight(df)
       >>> ti.reset(key).call(lambda: self.pandas())
       >>> key = 'light-from-pandas'
       >>> ti.reset(key).call(lambda: DataFrameLight(df))
       >>> key = 'light-from-dict'
       >>> ti.reset(key).call(lambda: DataFrameLight(self._data))
       >>> print('==============')
       >>> print('====== BENCHMARK: .LOC[] =======')
       >>> ti = ub.Timerit(num=20, bestof=4, verbose=True, unit='ms')
       >>> df_light = DataFrameLight._demodata(num=NUM)
       >>> # xdoctest: +REQUIRES(module:pandas)
       >>> df_heavy = df_light.pandas()
       >>> series_data = df_heavy.to_dict(orient='series')
       >>> list_data = df_heavy.to_dict(orient='list')
       >>> np_data = {k: v.values for k, v in df_heavy.to_dict(orient='series').items()}
       >>> for timer in ti.reset('DF-heavy.iloc'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             df_heavy.iloc[i]
       >>> for timer in ti.reset('DF-heavy.loc'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             df_heavy.iloc[i]
       >>> for timer in ti.reset('dict[SERIES].loc'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             {key: series_data[key].loc[i] for key in series_data.keys()}
       >>> for timer in ti.reset('dict[SERIES].iloc'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             {key: series_data[key].iloc[i] for key in series_data.keys()}
       >>> for timer in ti.reset('dict[SERIES][]'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             {key: series_data[key][i] for key in series_data.keys()}
       >>> for timer in ti.reset('dict[NDARRAY][]'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             {key: np_data[key][i] for key in np_data.keys()}
       >>> for timer in ti.reset('dict[list][]'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             {key: list_data[key][i] for key in np_data.keys()}
       >>> for timer in ti.reset('DF-Light.iloc/loc'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             df_light.iloc[i]
       >>> for timer in ti.reset('DF-Light._getrow'):
       >>>     with timer:
       >>>         for i in range(NUM):
       >>>             df_light._getrow(i)
       NUM = 1000
       ==============
       ====== to_dict conversions =====
       Timed best=0.022 ms, mean=0.022 ± 0.0 ms for series
       Timed best=0.059 ms, mean=0.059 ± 0.0 ms for series+numpy
       Timed best=0.315 ms, mean=0.315 ± 0.0 ms for list
       Timed best=0.895 ms, mean=0.895 ± 0.0 ms for dict
       Timed best=2.705 ms, mean=2.705 ± 0.0 ms for split
       Timed best=5.474 ms, mean=5.474 ± 0.0 ms for records
       Timed best=7.320 ms, mean=7.320 ± 0.0 ms for index
       ==============
       ====== DFLight Conversions =======
       Timed best=1.798 ms, mean=1.798 ± 0.0 ms for self.pandas
       Timed best=0.064 ms, mean=0.064 ± 0.0 ms for light-from-pandas
       Timed best=0.010 ms, mean=0.010 ± 0.0 ms for light-from-dict
       ==============
       ====== BENCHMARK: .LOC[] =======
       Timed best=101.365 ms, mean=101.564 ± 0.2 ms for DF-heavy.iloc
       Timed best=102.038 ms, mean=102.273 ± 0.2 ms for DF-heavy.loc
       Timed best=29.357 ms, mean=29.449 ± 0.1 ms for dict[SERIES].loc
       Timed best=21.701 ms, mean=22.014 ± 0.3 ms for dict[SERIES].iloc
       Timed best=11.469 ms, mean=11.566 ± 0.1 ms for dict[SERIES][]
       Timed best=0.807 ms, mean=0.826 ± 0.0 ms for dict[NDARRAY][]
       Timed best=0.478 ms, mean=0.492 ± 0.0 ms for dict[list][]
       Timed best=0.969 ms, mean=0.994 ± 0.0 ms for DF-Light.iloc/loc
       Timed best=0.760 ms, mean=0.776 ± 0.0 ms for DF-Light._getrow


   .. py:method:: iloc(self)
      :property:


   .. py:method:: values(self)
      :property:


   .. py:method:: loc(self)
      :property:


   .. py:method:: __eq__(self, other)

      .. rubric:: Example

      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> self = DataFrameLight._demodata(num=7)
      >>> other = self.pandas()
      >>> assert np.all(self == other)


   .. py:method:: to_string(self, *args, **kwargs)


   .. py:method:: to_dict(self, orient='dict', into=dict)

      Convert the data frame into a dictionary.

      :Parameters: * **orient** (*str*) -- Currently naitively suports orient in
                     {'dict', 'list'}, otherwise we fallback to pandas conversion
                     and call its to_dict method.
                   * **into** (*type*) -- type of dictionary to transform into

      :returns: dict

      .. rubric:: Example

      >>> from kwarray.dataframe_light import *  # NOQA
      >>> self = DataFrameLight._demodata(num=7)
      >>> print(self.to_dict(orient='dict'))
      >>> print(self.to_dict(orient='list'))


   .. py:method:: pandas(self)

      Convert back to pandas if you need the full API

      .. rubric:: Example

      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_light = DataFrameLight._demodata(num=7)
      >>> df_heavy = df_light.pandas()
      >>> got = DataFrameLight(df_heavy)
      >>> assert got._data == df_light._data


   .. py:method:: _pandas(self)

      Deprecated, use self.pandas instead


   .. py:method:: _demodata(cls, num=7)
      :classmethod:

      .. rubric:: Example

      >>> self = DataFrameLight._demodata(num=7)
      >>> print('self = {!r}'.format(self))
      >>> other = DataFrameLight._demodata(num=11)
      >>> print('other = {!r}'.format(other))
      >>> both = self.union(other)
      >>> print('both = {!r}'.format(both))
      >>> assert both is not self
      >>> assert other is not self


   .. py:method:: __nice__(self)


   .. py:method:: __len__(self)


   .. py:method:: __contains__(self, item)


   .. py:method:: __normalize__(self)

      Try to convert input data to Dict[List]


   .. py:method:: columns(self)
      :property:


   .. py:method:: sort_values(self, key, inplace=False, ascending=True)


   .. py:method:: keys(self)


   .. py:method:: _getrow(self, index)


   .. py:method:: _getcol(self, key)


   .. py:method:: _getcols(self, keys)


   .. py:method:: get(self, key, default=None)

      Get item for given key. Returns default value if not found.


   .. py:method:: clear(self)

      Removes all rows inplace


   .. py:method:: __getitem__(self, key)

      .. note:: only handles the case where key is a single column name.

      .. rubric:: Example

      >>> df_light = DataFrameLight._demodata(num=7)
      >>> sub1 = df_light['bar']
      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_heavy = df_light.pandas()
      >>> sub2 = df_heavy['bar']
      >>> assert np.all(sub1 == sub2)


   .. py:method:: __setitem__(self, key, value)

      .. note::

         only handles the case where key is a single column name. and value
         is an array of all the values to set.

      .. rubric:: Example

      >>> df_light = DataFrameLight._demodata(num=7)
      >>> value = [2] * len(df_light)
      >>> df_light['bar'] = value
      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_heavy = df_light.pandas()
      >>> df_heavy['bar'] = value
      >>> assert np.all(df_light == df_heavy)


   .. py:method:: compress(self, flags, inplace=False)

      NOTE: NOT A PART OF THE PANDAS API


   .. py:method:: take(self, indices, inplace=False)

      Return the elements in the given *positional* indices along an axis.

      :Parameters: **inplace** (*bool*) -- NOT PART OF PANDAS API

      .. rubric:: Notes

      assumes axis=0

      .. rubric:: Example

      >>> df_light = DataFrameLight._demodata(num=7)
      >>> indices = [0, 2, 3]
      >>> sub1 = df_light.take(indices)
      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_heavy = df_light.pandas()
      >>> sub2 = df_heavy.take(indices)
      >>> assert np.all(sub1 == sub2)


   .. py:method:: copy(self)


   .. py:method:: extend(self, other)

      Extend ``self`` inplace using another dataframe array

      :Parameters: **other** (*DataFrameLight | dict[str, Sequence]*) -- values to concat to end of this object

      .. note:: Not part of the pandas API

      .. rubric:: Example

      >>> self = DataFrameLight(columns=['foo', 'bar'])
      >>> other = {'foo': [0], 'bar': [1]}
      >>> self.extend(other)
      >>> assert len(self) == 1


   .. py:method:: union(self, *others)

      .. note:: Note part of the pandas API


   .. py:method:: concat(cls, others)
      :classmethod:


   .. py:method:: from_pandas(cls, df)
      :classmethod:


   .. py:method:: from_dict(cls, records)
      :classmethod:


   .. py:method:: reset_index(self, drop=False)

      noop for compatability, the light version doesnt store an index


   .. py:method:: groupby(self, by=None, *args, **kwargs)

      Group rows by the value of a column. Unlike pandas this simply
      returns a zip object. To ensure compatiability call list on the
      result of groupby.

      :Parameters: * **by** (*str*) -- column name to group by
                   * **\*args** -- if specified, the dataframe is coerced to pandas
                   * **\*kwargs** -- if specified, the dataframe is coerced to pandas

      .. rubric:: Example

      >>> df_light = DataFrameLight._demodata(num=7)
      >>> res1 = list(df_light.groupby('bar'))
      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_heavy = df_light.pandas()
      >>> res2 = list(df_heavy.groupby('bar'))
      >>> assert len(res1) == len(res2)
      >>> assert all([np.all(a[1] == b[1]) for a, b in zip(res1, res2)])

      Ignore:
          >>> self = DataFrameLight._demodata(num=1000)
          >>> args = ['cx']
          >>> self['cx'] = (np.random.rand(len(self)) * 10).astype(np.int)
          >>> # As expected, our custom restricted implementation is faster
          >>> # than pandas
          >>> ub.Timerit(100).call(lambda: dict(list(self.pandas().groupby('cx')))).print()
          >>> ub.Timerit(100).call(lambda: dict(self.groupby('cx'))).print()


   .. py:method:: rename(self, mapper=None, columns=None, axis=None, inplace=False)

      Rename the columns (index renaming is not supported)

      .. rubric:: Example

      >>> df_light = DataFrameLight._demodata(num=7)
      >>> mapper = {'foo': 'fi'}
      >>> res1 = df_light.rename(columns=mapper)
      >>> res3 = df_light.rename(mapper, axis=1)
      >>> # xdoctest: +REQUIRES(module:pandas)
      >>> df_heavy = df_light.pandas()
      >>> res2 = df_heavy.rename(columns=mapper)
      >>> res4 = df_heavy.rename(mapper, axis=1)
      >>> assert np.all(res1 == res2)
      >>> assert np.all(res3 == res2)
      >>> assert np.all(res3 == res4)


   .. py:method:: iterrows(self)

      Iterate over rows as (index, Dict) pairs.

      :Yields: *Tuple[int, Dict]* -- the index and a dictionary representing a row

      .. rubric:: Example

      >>> from kwarray.dataframe_light import *  # NOQA
      >>> self = DataFrameLight._demodata(num=3)
      >>> print(ub.repr2(list(self.iterrows())))
      [
          (0, {'bar': 0, 'baz': 2.73, 'foo': 0}),
          (1, {'bar': 1, 'baz': 2.73, 'foo': 0}),
          (2, {'bar': 2, 'baz': 2.73, 'foo': 0}),
      ]

      Benchmark:
          >>> # xdoc: +REQUIRES(--bench)
          >>> from kwarray.dataframe_light import *  # NOQA
          >>> import ubelt as ub
          >>> df_light = DataFrameLight._demodata(num=1000)
          >>> df_heavy = df_light.pandas()
          >>> ti = ub.Timerit(21, bestof=3, verbose=2, unit='ms')
          >>> ti.reset('light').call(lambda: list(df_light.iterrows()))
          >>> ti.reset('heavy').call(lambda: list(df_heavy.iterrows()))
          >>> # xdoctest: +IGNORE_WANT
          Timed light for: 21 loops, best of 3
              time per loop: best=0.834 ms, mean=0.850 ± 0.0 ms
          Timed heavy for: 21 loops, best of 3
              time per loop: best=45.007 ms, mean=45.633 ± 0.5 ms


.. py:class:: LocLight(parent)

   Bases: :py:obj:`object`

   .. py:method:: __getitem__(self, index)


.. py:function:: standard_normal(size, mean=0, std=1, dtype=float, rng=np.random)

   Draw samples from a standard Normal distribution with a specified mean and
   standard deviation.

   :Parameters: * **size** (*int | Tuple[int, *int]*) -- shape of the returned ndarray
                * **mean** (*float, default=0*) -- mean of the normal distribution
                * **std** (*float, default=1*) -- standard deviation of the normal distribution
                * **dtype** (*type*) -- either np.float32 or np.float64
                * **rng** (*numpy.random.RandomState*) -- underlying random state

   :returns: normally distributed random numbers with chosen dtype
   :rtype: ndarray[dtype]

   Benchmark:
       >>> from timerit import Timerit
       >>> import kwarray
       >>> size = (300, 300, 3)
       >>> for timer in Timerit(100, bestof=10, label='dtype=np.float32'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         ours = standard_normal(size, rng=rng, dtype=np.float32)
       >>> # Timed best=4.705 ms, mean=4.75 ± 0.085 ms for dtype=np.float32
       >>> for timer in Timerit(100, bestof=10, label='dtype=np.float64'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         theirs = standard_normal(size, rng=rng, dtype=np.float64)
       >>> # Timed best=9.327 ms, mean=9.794 ± 0.4 ms for rng.np.float64


.. py:function:: standard_normal32(size, mean=0, std=1, rng=np.random)

   Fast normally distributed random variables using the Box–Muller transform

   The difference between this function and
   :func:`numpy.random.standard_normal` is that we use float32 arrays in the
   backend instead of float64.  Halving the amount of bits that need to be
   manipulated can significantly reduce the execution time, and 32-bit
   precision is often good enough.

   :Parameters: * **size** (*int | Tuple[int, *int]*) -- shape of the returned ndarray
                * **mean** (*float, default=0*) -- mean of the normal distribution
                * **std** (*float, default=1*) -- standard deviation of the normal distribution
                * **rng** (*numpy.random.RandomState*) -- underlying random state

   :returns: normally distributed random numbers
   :rtype: ndarray[float32]

   .. rubric:: References

   https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

   SeeAlso:
       * standard_normal
       * standard_normal64

   .. rubric:: Example

   >>> import scipy
   >>> import scipy.stats
   >>> pts = 1000
   >>> # Our numbers are normally distributed with high probability
   >>> rng = np.random.RandomState(28041990)
   >>> ours_a = standard_normal32(pts, rng=rng)
   >>> ours_b = standard_normal32(pts, rng=rng) + 2
   >>> ours = np.concatenate((ours_a, ours_b))  # numerical stability?
   >>> p = scipy.stats.normaltest(ours)[1]
   >>> print('Probability our data is non-normal is: {:.4g}'.format(p))
   Probability our data is non-normal is: 1.573e-14
   >>> rng = np.random.RandomState(28041990)
   >>> theirs_a = rng.standard_normal(pts)
   >>> theirs_b = rng.standard_normal(pts) + 2
   >>> theirs = np.concatenate((theirs_a, theirs_b))
   >>> p = scipy.stats.normaltest(theirs)[1]
   >>> print('Probability their data is non-normal is: {:.4g}'.format(p))
   Probability their data is non-normal is: 3.272e-11

   .. rubric:: Example

   >>> pts = 1000
   >>> rng = np.random.RandomState(28041990)
   >>> ours = standard_normal32(pts, mean=10, std=3, rng=rng)
   >>> assert np.abs(ours.std() - 3.0) < 0.1
   >>> assert np.abs(ours.mean() - 10.0) < 0.1

   .. rubric:: Example

   >>> # Test an even and odd numbers of points
   >>> assert standard_normal32(3).shape == (3,)
   >>> assert standard_normal32(2).shape == (2,)
   >>> assert standard_normal32(1).shape == (1,)
   >>> assert standard_normal32(0).shape == (0,)
   >>> assert standard_normal32((3, 1)).shape == (3, 1)
   >>> assert standard_normal32((3, 0)).shape == (3, 0)


.. py:function:: standard_normal64(size, mean=0, std=1, rng=np.random)

   Simple wrapper around rng.standard_normal to make an API compatible with
   :func:`standard_normal32`.

   :Parameters: * **size** (*int | Tuple[int, *int]*) -- shape of the returned ndarray
                * **mean** (*float, default=0*) -- mean of the normal distribution
                * **std** (*float, default=1*) -- standard deviation of the normal distribution
                * **rng** (*numpy.random.RandomState*) -- underlying random state

   :returns: normally distributed random numbers
   :rtype: ndarray[float64]

   SeeAlso:
       * standard_normal
       * standard_normal32

   .. rubric:: Example

   >>> pts = 1000
   >>> rng = np.random.RandomState(28041994)
   >>> out = standard_normal64(pts, mean=10, std=3, rng=rng)
   >>> assert np.abs(out.std() - 3.0) < 0.1
   >>> assert np.abs(out.mean() - 10.0) < 0.1


.. py:function:: uniform(low=0.0, high=1.0, size=None, dtype=np.float32, rng=np.random)

   Draws float32 samples from a uniform distribution.

   Samples are uniformly distributed over the half-open interval
   ``[low, high)`` (includes low, but excludes high).

   :Parameters: * **low** (*float, default=0.0*) -- Lower boundary of the output interval.  All values generated will
                  be greater than or equal to low.
                * **high** (*float, default=1.0*) -- Upper boundary of the output interval.  All values generated will
                  be less than high.
                * **size** (*int | Tuple[int], default=None*) -- Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
                  ``m * n * k`` samples are drawn.  If size is ``None`` (default),
                  a single value is returned if ``low`` and ``high`` are both scalars.
                  Otherwise, ``np.broadcast(low, high).size`` samples are drawn.
                * **dtype** (*type*) -- either np.float32 or np.float64
                * **rng** (*numpy.random.RandomState*) -- underlying random state

   :returns: normally distributed random numbers with chosen dtype
   :rtype: ndarray[dtype]

   Benchmark:
       >>> from timerit import Timerit
       >>> import kwarray
       >>> size = (300, 300, 3)
       >>> for timer in Timerit(100, bestof=10, label='dtype=np.float32'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         ours = standard_normal(size, rng=rng, dtype=np.float32)
       >>> # Timed best=4.705 ms, mean=4.75 ± 0.085 ms for dtype=np.float32
       >>> for timer in Timerit(100, bestof=10, label='dtype=np.float64'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         theirs = standard_normal(size, rng=rng, dtype=np.float64)
       >>> # Timed best=9.327 ms, mean=9.794 ± 0.4 ms for rng.np.float64


.. py:function:: uniform32(low=0.0, high=1.0, size=None, rng=np.random)

   Draws float32 samples from a uniform distribution.

   Samples are uniformly distributed over the half-open interval
   ``[low, high)`` (includes low, but excludes high).

   :Parameters: * **low** (*float, default=0.0*) -- Lower boundary of the output interval.  All values generated will
                  be greater than or equal to low.
                * **high** (*float, default=1.0*) -- Upper boundary of the output interval.  All values generated will
                  be less than high.
                * **size** (*int | Tuple[int], default=None*) -- Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
                  ``m * n * k`` samples are drawn.  If size is ``None`` (default),
                  a single value is returned if ``low`` and ``high`` are both scalars.
                  Otherwise, ``np.broadcast(low, high).size`` samples are drawn.

   .. rubric:: Example

   >>> rng = np.random.RandomState(0)
   >>> uniform32(low=0.0, high=1.0, size=None, rng=rng)
   0.5488...
   >>> uniform32(low=0.0, high=1.0, size=2000, rng=rng).sum()
   1004.94...
   >>> uniform32(low=-10, high=10.0, size=2000, rng=rng).sum()
   202.44...

   Benchmark:
       >>> from timerit import Timerit
       >>> import kwarray
       >>> size = 512 * 512
       >>> for timer in Timerit(100, bestof=10, label='theirs: dtype=np.float64'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         theirs = rng.uniform(size=size)
       >>> for timer in Timerit(100, bestof=10, label='theirs: dtype=np.float32'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         theirs = rng.rand(size).astype(np.float32)
       >>> for timer in Timerit(100, bestof=10, label='ours: dtype=np.float32'):
       >>>     rng = kwarray.ensure_rng(0)
       >>>     with timer:
       >>>         ours = uniform32(size=size)


.. py:class:: RunningStats(run)

   Bases: :py:obj:`ubelt.NiceRepr`

   Dynamically records per-element array statistics and can summarized them
   per-element, across channels, or globally.

   .. todo:: - [ ] This may need a few API tweaks and good documentation

   .. rubric:: Example

   >>> import kwarray
   >>> run = kwarray.RunningStats()
   >>> ch1 = np.array([[0, 1], [3, 4]])
   >>> ch2 = np.zeros((2, 2))
   >>> img = np.dstack([ch1, ch2])
   >>> run.update(np.dstack([ch1, ch2]))
   >>> run.update(np.dstack([ch1 + 1, ch2]))
   >>> run.update(np.dstack([ch1 + 2, ch2]))
   >>> # No marginalization
   >>> print('current-ave = ' + ub.repr2(run.summarize(axis=ub.NoParam), nl=2, precision=3))
   >>> # Average over channels (keeps spatial dims separate)
   >>> print('chann-ave(k=1) = ' + ub.repr2(run.summarize(axis=0), nl=2, precision=3))
   >>> print('chann-ave(k=0) = ' + ub.repr2(run.summarize(axis=0, keepdims=0), nl=2, precision=3))
   >>> # Average over spatial dims (keeps channels separate)
   >>> print('spatial-ave(k=1) = ' + ub.repr2(run.summarize(axis=(1, 2)), nl=2, precision=3))
   >>> print('spatial-ave(k=0) = ' + ub.repr2(run.summarize(axis=(1, 2), keepdims=0), nl=2, precision=3))
   >>> # Average over all dims
   >>> print('alldim-ave(k=1) = ' + ub.repr2(run.summarize(axis=None), nl=2, precision=3))
   >>> print('alldim-ave(k=0) = ' + ub.repr2(run.summarize(axis=None, keepdims=0), nl=2, precision=3))

   .. py:method:: __nice__(self)


   .. py:method:: shape(run)
      :property:


   .. py:method:: update(run, data, weights=1)

      Updates statistics across all data dimensions on a per-element basis

      .. rubric:: Example

      >>> import kwarray
      >>> data = np.full((7, 5), fill_value=1.3)
      >>> weights = np.ones((7, 5), dtype=np.float32)
      >>> run = kwarray.RunningStats()
      >>> run.update(data, weights=1)
      >>> run.update(data, weights=weights)
      >>> rng = np.random
      >>> weights[rng.rand(*weights.shape) > 0.5] = 0
      >>> run.update(data, weights=weights)


   .. py:method:: _sumsq_std(run, total, squares, n)

      Sum of squares method to compute standard deviation


   .. py:method:: summarize(run, axis=None, keepdims=True)

      Compute summary statistics across a one or more dimension

      :Parameters: * **axis** (*int | List[int] | None | ub.NoParam*) -- axis or axes to summarize over,
                     if None, all axes are summarized.
                     if ub.NoParam, no axes are summarized the current result is
                         returned.
                   * **keepdims** (*bool, default=True*) -- if False removes the dimensions that are summarized over

      :returns: containing minimum, maximum, mean, std, etc..
      :rtype: Dict


   .. py:method:: current(run)

      Returns current staticis on a per-element basis
      (not summarized over any axis)

      .. todo::

         - [X] I want this method and summarize to be unified somehow.
             I don't know how to paramatarize it because axis=None usually
             means summarize over everything, and I need to way to encode,
             summarize over nothing but the "sequence" dimension (which was
             given incrementally by the update function), which is what
             this function does.


.. py:function:: stats_dict(inputs, axis=None, nan=False, sum=False, extreme=True, n_extreme=False, median=False, shape=True, size=False)

   Describe statistics about an input array

   :Parameters: * **inputs** (*ArrayLike*) -- set of values to get statistics of
                * **axis** (*int*) -- if ``inputs`` is ndarray then this specifies the axis
                * **nan** (*bool*) -- report number of nan items
                * **sum** (*bool*) -- report sum of values
                * **extreme** (*bool*) -- report min and max values
                * **n_extreme** (*bool*) -- report extreme value frequencies
                * **median** (*bool*) -- report median
                * **size** (*bool*) -- report array size
                * **shape** (*bool*) -- report array shape

   :returns:

             stats: dictionary of common numpy statistics
                 (min, max, mean, std, nMin, nMax, shape)
   :rtype: collections.OrderedDict

   SeeAlso:
       scipy.stats.describe

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> from kwarray.util_averages import *  # NOQA
   >>> axis = 0
   >>> rng = np.random.RandomState(0)
   >>> inputs = rng.rand(10, 2).astype(np.float32)
   >>> stats = stats_dict(inputs, axis=axis, nan=False, median=True)
   >>> import ubelt as ub  # NOQA
   >>> result = str(ub.repr2(stats, nl=1, precision=4, with_dtype=True))
   >>> print(result)
   {
       'mean': np.array([ 0.5206,  0.6425], dtype=np.float32),
       'std': np.array([ 0.2854,  0.2517], dtype=np.float32),
       'min': np.array([ 0.0202,  0.0871], dtype=np.float32),
       'max': np.array([ 0.9637,  0.9256], dtype=np.float32),
       'med': np.array([0.5584, 0.6805], dtype=np.float32),
       'shape': (10, 2),
   }

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> axis = 0
   >>> rng = np.random.RandomState(0)
   >>> inputs = rng.randint(0, 42, size=100).astype(np.float32)
   >>> inputs[4] = np.nan
   >>> stats = stats_dict(inputs, axis=axis, nan=True)
   >>> import ubelt as ub  # NOQA
   >>> result = str(ub.repr2(stats, nl=0, precision=1, strkeys=True))
   >>> print(result)
   {mean: 20.0, std: 13.2, min: 0.0, max: 41.0, num_nan: 1, shape: (100,)}


.. py:function:: apply_grouping(items, groupxs, axis=0)

   Applies grouping from group_indicies.

   Typically used in conjunction with :func:`group_indices`.

   :Parameters: * **items** (*ndarray*) -- items to group
                * **groupxs** (*List[ndarrays[int]]*) -- groups of indices
                * **axis** (*None|int, default=0*)

   :returns: grouped items
   :rtype: List[ndarray]

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3])
   >>> items          = np.array([1, 8, 5, 5, 8, 6, 7, 5, 3, 0, 9])
   >>> (keys, groupxs) = group_indices(idx_to_groupid)
   >>> grouped_items = apply_grouping(items, groupxs)
   >>> result = str(grouped_items)
   >>> print(result)
   [array([8, 5, 6]), array([1, 5, 8, 7]), array([5, 3, 0, 9])]


.. py:function:: group_consecutive(arr, offset=1)

   Returns lists of consecutive values. Implementation inspired by [3]_.

   :Parameters: * **arr** (*ndarray*) -- array of ordered values
                * **offset** (*float, default=1*) -- any two values separated by this offset are grouped.  In the
                  default case, when offset=1, this groups increasing values like: 0,
                  1, 2. When offset is 0 it groups consecutive values thta are the
                  same, e.g.: 4, 4, 4.

   :returns: a list of arrays that are the groups from the input
   :rtype: List[ndarray]

   .. rubric:: Notes

   This is equivalent (and faster) to using:
   apply_grouping(data, group_consecutive_indices(data))

   .. rubric:: References

   .. [3] http://stackoverflow.com/questions/7352684/groups-consecutive-elements

   .. rubric:: Example

   >>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101])
   >>> groups = group_consecutive(arr)
   >>> print('groups = {}'.format(list(map(list, groups))))
   groups = [[1, 2, 3], [5, 6, 7, 8, 9, 10], [15], [99, 100, 101]]
   >>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1])
   >>> groups = group_consecutive(arr, offset=1)
   >>> print('groups = {}'.format(list(map(list, groups))))
   groups = [[0], [0], [3], [0], [0], [7], [2, 3, 4], [4], [4], [1], [1]]
   >>> groups = group_consecutive(arr, offset=0)
   >>> print('groups = {}'.format(list(map(list, groups))))
   groups = [[0, 0], [3], [0, 0], [7], [2], [3], [4, 4, 4], [1, 1]]


.. py:function:: group_consecutive_indices(arr, offset=1)

   Returns lists of indices pointing to consecutive values

   :Parameters: * **arr** (*ndarray*) -- array of ordered values
                * **offset** (*float, default=1*) -- any two values separated by this offset are grouped.

   :returns: groupxs: a list of indices
   :rtype: List[ndarray]

   SeeAlso:

       :func:`group_consecutive`

       :func:`apply_grouping`

   .. rubric:: Example

   >>> arr = np.array([1, 2, 3, 5, 6, 7, 8, 9, 10, 15, 99, 100, 101])
   >>> groupxs = group_consecutive_indices(arr)
   >>> print('groupxs = {}'.format(list(map(list, groupxs))))
   groupxs = [[0, 1, 2], [3, 4, 5, 6, 7, 8], [9], [10, 11, 12]]
   >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs)))
   >>> arr = np.array([0, 0, 3, 0, 0, 7, 2, 3, 4, 4, 4, 1, 1])
   >>> groupxs = group_consecutive_indices(arr, offset=1)
   >>> print('groupxs = {}'.format(list(map(list, groupxs))))
   groupxs = [[0], [1], [2], [3], [4], [5], [6, 7, 8], [9], [10], [11], [12]]
   >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 1), apply_grouping(arr, groupxs)))
   >>> groupxs = group_consecutive_indices(arr, offset=0)
   >>> print('groupxs = {}'.format(list(map(list, groupxs))))
   groupxs = [[0, 1], [2], [3, 4], [5], [6], [7], [8, 9, 10], [11, 12]]
   >>> assert all(np.array_equal(a, b) for a, b in zip(group_consecutive(arr, 0), apply_grouping(arr, groupxs)))


.. py:function:: group_indices(idx_to_groupid, assume_sorted=False)

   Find unique items and the indices at which they appear in an array.

   A common use case of this function is when you have a list of objects
   (often numeric but sometimes not) and an array of "group-ids" corresponding
   to that list of objects.

   Using this function will return a list of indices that can be used in
   conjunction with :func:`apply_grouping` to group the elements.  This is
   most useful when you have many lists (think column-major data)
   corresponding to the group-ids.

   In cases where there is only one list of objects or knowing the indices
   doesn't matter, then consider using func:`group_items` instead.

   :Parameters: * **idx_to_groupid** (*ndarray*) -- The input array, where each item is interpreted as a group id.
                  For the fastest runtime, the input array must be numeric (ideally
                  with integer types).  If the type is non-numeric then the less
                  efficient :func:`ubelt.group_items` is used.
                * **assume_sorted** (*bool, default=False*) -- If the input array is sorted, then setting this to True will avoid
                  an unnecessary sorting operation and improve efficiency.

   :returns:

             (keys, groupxs) -
                 keys (ndarray):
                     The unique elements of the input array in order
                 groupxs (List[ndarray]):
                     Corresponding list of indexes.  The i-th item is an array
                     indicating the indices where the item ``key[i]`` appeared in
                     the input array.
   :rtype: Tuple[ndarray, List[ndarrays]]

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> import ubelt as ub
   >>> idx_to_groupid = np.array([2, 1, 2, 1, 2, 1, 2, 3, 3, 3, 3])
   >>> (keys, groupxs) = group_indices(idx_to_groupid)
   >>> print(ub.repr2(keys, with_dtype=False))
   >>> print(ub.repr2(groupxs, with_dtype=False))
   np.array([1, 2, 3])
   [
       np.array([1, 3, 5]),
       np.array([0, 2, 4, 6]),
       np.array([ 7,  8,  9, 10]),
   ]

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> import ubelt as ub
   >>> idx_to_groupid = np.array([[  24], [ 129], [ 659], [ 659], [ 24],
   ...       [659], [ 659], [ 822], [ 659], [ 659], [24]])
   >>> # 2d arrays must be flattened before coming into this function so
   >>> # information is on the last axis
   >>> (keys, groupxs) = group_indices(idx_to_groupid.T[0])
   >>> print(ub.repr2(keys, with_dtype=False))
   >>> print(ub.repr2(groupxs, with_dtype=False))
   np.array([ 24, 129, 659, 822])
   [
       np.array([ 0,  4, 10]),
       np.array([1]),
       np.array([2, 3, 5, 6, 8, 9]),
       np.array([7]),
   ]

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> import ubelt as ub
   >>> idx_to_groupid = np.array([True, True, False, True, False, False, True])
   >>> (keys, groupxs) = group_indices(idx_to_groupid)
   >>> print(ub.repr2(keys, with_dtype=False))
   >>> print(ub.repr2(groupxs, with_dtype=False))
   np.array([False,  True])
   [
       np.array([2, 4, 5]),
       np.array([0, 1, 3, 6]),
   ]

   .. rubric:: Example

   >>> # xdoctest: +IGNORE_WHITESPACE
   >>> import ubelt as ub
   >>> idx_to_groupid = [('a', 'b'),  ('d', 'b'), ('a', 'b'), ('a', 'b')]
   >>> (keys, groupxs) = group_indices(idx_to_groupid)
   >>> print(ub.repr2(keys, with_dtype=False))
   >>> print(ub.repr2(groupxs, with_dtype=False))
   [
       ('a', 'b'),
       ('d', 'b'),
   ]
   [
       np.array([0, 2, 3]),
       np.array([1]),
   ]


.. py:function:: group_items(item_list, groupid_list, assume_sorted=False, axis=None)

   Groups a list of items by group id.

   Works like :func:`ubelt.group_items`, but with numpy optimizations.
   This can be quite a bit faster than using :func:`itertools.groupby` [1]_
   [2]_.

   In cases where there are many lists of items to group (think column-major
   data), consider using :func:`group_indices` and :func:`apply_grouping`
   instead.

   :Parameters: * **item_list** (*ndarray[T1]*) -- The input array of items to group.
                * **groupid_list** (*ndarray[T2]*) -- Each item is an id corresponding to the item at the same position
                  in ``item_list``.  For the fastest runtime, the input array must be
                  numeric (ideally with integer types). This list must be
                  1-dimensional.
                * **assume_sorted** (*bool, default=False*) -- If the input array is sorted, then setting this to True will avoid
                  an unnecessary sorting operation and improve efficiency.
                * **axis** (*int | None*) -- group along a particular axis in ``items`` if it is n-dimensional

   :returns: mapping from groupids to corresponding items
   :rtype: Dict[T2, ndarray[T1]]

   .. rubric:: References

   .. [1] http://stackoverflow.com/questions/4651683/
   .. [2] numpy-grouping-using-itertools-groupby-performance

   .. rubric:: Example

   >>> from kwarray.util_groups import *  # NOQA
   >>> items = np.array([0, 1, 2, 3, 4, 5, 6, 7])
   >>> keys = np.array( [2, 2, 1, 1, 0, 1, 0, 1])
   >>> grouped = group_items(items, keys)
   >>> print(ub.repr2(grouped, nl=1, with_dtype=False))
   {
       0: np.array([4, 6]),
       1: np.array([2, 3, 5, 7]),
       2: np.array([0, 1]),
   }


.. py:class:: FlatIndexer(lens)

   Bases: :py:obj:`ubelt.NiceRepr`

   Creates a flat "view" of a jagged nested indexable object.
   Only supports one offset level.

   :Parameters: **lens** (*list*) -- a list of the lengths of the nested objects.

   Doctest:
       >>> self = FlatIndexer([1, 2, 3])
       >>> len(self)
       >>> self.unravel(4)
       >>> self.ravel(2, 1)

   .. py:method:: fromlist(cls, items)
      :classmethod:

      Convenience method to create a :class:`FlatIndexer` from the list of
      items itself instead of the array of lengths.

      :Parameters: **items** (*List[list]*) -- a list of the lists you want to flat index over

      :returns: FlatIndexer


   .. py:method:: __len__(self)


   .. py:method:: unravel(self, index)

      :Parameters: **index** -- raveled index

      :returns: outer and inner indices
      :rtype: Tuple[int, int]

      .. rubric:: Example

      >>> import kwarray
      >>> rng = kwarray.ensure_rng(0)
      >>> items = [rng.rand(rng.randint(0, 10)) for _ in range(10)]
      >>> self = kwarray.FlatIndexer.fromlist(items)
      >>> index = np.arange(0, len(self))
      >>> outer, inner = self.unravel(index)
      >>> recon = self.ravel(outer, inner)
      >>> # This check is only possible because index is an arange
      >>> check1 = np.hstack(list(map(sorted, kwarray.group_indices(outer)[1])))
      >>> check2 = np.hstack(kwarray.group_consecutive_indices(inner))
      >>> assert np.all(check1 == index)
      >>> assert np.all(check2 == index)
      >>> assert np.all(index == recon)


   .. py:method:: ravel(self, outer, inner)

      :Parameters: * **outer** -- index into outer list
                   * **inner** -- index into the list referenced by outer

      :returns: the raveled index
      :rtype: index


.. py:function:: arglexmax(keys, multi=False)

   Find the index of the maximum element in a sequence of keys.

   :Parameters: * **keys** (*tuple*) -- a k-tuple of k N-dimensional arrays.
                  Like np.lexsort the last key in the sequence is used for the
                  primary sort order, the second-to-last key for the secondary sort
                  order, and so on.
                * **multi** (*bool*) -- if True, returns all indices that share the max value

   :returns: either the index or list of indices
   :rtype: int | ndarray[int]

   .. rubric:: Example

   >>> k, N = 100, 100
   >>> rng = np.random.RandomState(0)
   >>> keys = [(rng.rand(N) * N).astype(int) for _ in range(k)]
   >>> multi_idx = arglexmax(keys, multi=True)
   >>> idxs = np.lexsort(keys)
   >>> assert sorted(idxs[::-1][:len(multi_idx)]) == sorted(multi_idx)

   Benchark:
       >>> import ubelt as ub
       >>> k, N = 100, 100
       >>> rng = np.random
       >>> keys = [(rng.rand(N) * N).astype(int) for _ in range(k)]
       >>> for timer in ub.Timerit(100, bestof=10, label='arglexmax'):
       >>>     with timer:
       >>>         arglexmax(keys)
       >>> for timer in ub.Timerit(100, bestof=10, label='lexsort'):
       >>>     with timer:
       >>>         np.lexsort(keys)[-1]


.. py:function:: argmaxima(arr, num, axis=None, ordered=True)

   Returns the top ``num`` maximum indicies.

   This can be significantly faster than using argsort.

   :Parameters: * **arr** (*ndarray*) -- input array
                * **num** (*int*) -- number of maximum indices to return
                * **axis** (*int|None*) -- axis to find maxima over. If None this is equivalent
                  to using arr.ravel().
                * **ordered** (*bool*) -- if False, returns the maximum elements in an arbitrary
                  order, otherwise they are in decending order. (Setting this to
                  false is a bit faster).

   .. todo:: - [ ] if num is None, return arg for all values equal to the maximum

   :returns: ndarray

   .. rubric:: Example

   >>> # Test cases with axis=None
   >>> arr = (np.random.rand(100) * 100).astype(int)
   >>> for num in range(0, len(arr) + 1):
   >>>     idxs = argmaxima(arr, num)
   >>>     idxs2 = argmaxima(arr, num, ordered=False)
   >>>     assert np.all(arr[idxs] == np.array(sorted(arr)[::-1][:len(idxs)])), 'ordered=True must return in order'
   >>>     assert sorted(idxs2) == sorted(idxs), 'ordered=False must return the right idxs, but in any order'

   .. rubric:: Example

   >>> # Test cases with axis
   >>> arr = (np.random.rand(3, 5, 7) * 100).astype(int)
   >>> for axis in range(len(arr.shape)):
   >>>     for num in range(0, len(arr) + 1):
   >>>         idxs = argmaxima(arr, num, axis=axis)
   >>>         idxs2 = argmaxima(arr, num, ordered=False, axis=axis)
   >>>         assert idxs.shape[axis] == num
   >>>         assert idxs2.shape[axis] == num


.. py:function:: argminima(arr, num, axis=None, ordered=True)

   Returns the top ``num`` minimum indicies.

   This can be significantly faster than using argsort.

   :Parameters: * **arr** (*ndarray*) -- input array
                * **num** (*int*) -- number of minimum indices to return
                * **axis** (*int|None*) -- axis to find minima over.
                  If None this is equivalent to using arr.ravel().
                * **ordered** (*bool*) -- if False, returns the minimum elements in an arbitrary
                  order, otherwise they are in ascending order. (Setting this to
                  false is a bit faster).

   .. rubric:: Example

   >>> arr = (np.random.rand(100) * 100).astype(int)
   >>> for num in range(0, len(arr) + 1):
   >>>     idxs = argminima(arr, num)
   >>>     assert np.all(arr[idxs] == np.array(sorted(arr)[:len(idxs)])), 'ordered=True must return in order'
   >>>     idxs2 = argminima(arr, num, ordered=False)
   >>>     assert sorted(idxs2) == sorted(idxs), 'ordered=False must return the right idxs, but in any order'

   .. rubric:: Example

   >>> # Test cases with axis
   >>> from kwarray.util_numpy import *  # NOQA
   >>> arr = (np.random.rand(3, 5, 7) * 100).astype(int)
   >>> # make a unique array so we can check argmax consistency
   >>> arr = np.arange(3 * 5 * 7)
   >>> np.random.shuffle(arr)
   >>> arr = arr.reshape(3, 5, 7)
   >>> for axis in range(len(arr.shape)):
   >>>     for num in range(0, len(arr) + 1):
   >>>         idxs = argminima(arr, num, axis=axis)
   >>>         idxs2 = argminima(arr, num, ordered=False, axis=axis)
   >>>         print('idxs = {!r}'.format(idxs))
   >>>         print('idxs2 = {!r}'.format(idxs2))
   >>>         assert idxs.shape[axis] == num
   >>>         assert idxs2.shape[axis] == num
   >>>         # Check if argmin argrees with -argmax
   >>>         idxs3 = argmaxima(-arr, num, axis=axis)
   >>>         assert np.all(idxs3 == idxs)

   .. rubric:: Example

   >>> arr = np.arange(20).reshape(4, 5) % 6
   >>> argminima(arr, axis=1, num=2, ordered=False)
   >>> argminima(arr, axis=1, num=2, ordered=True)
   >>> argmaxima(-arr, axis=1, num=2, ordered=True)
   >>> argmaxima(-arr, axis=1, num=2, ordered=False)


.. py:function:: atleast_nd(arr, n, front=False)

   View inputs as arrays with at least n dimensions.

   :Parameters: * **arr** (*array_like*) -- An array-like object.  Non-array inputs are converted to arrays.
                  Arrays that already have n or more dimensions are preserved.
                * **n** (*int*) -- number of dimensions to ensure
                * **front** (*bool, default=False*) -- if True new dimensions are added to the front of the array.
                  otherwise they are added to the back.

   :returns:     An array with ``a.ndim >= n``.  Copies are avoided where possible,
                 and views with three or more dimensions are returned.  For example,
                 a 1-D array of shape ``(N,)`` becomes a view of shape
                 ``(1, N, 1)``, and a 2-D array of shape ``(M, N)`` becomes a view
                 of shape ``(M, N, 1)``.
   :rtype: ndarray

   .. seealso:: numpy.atleast_1d, numpy.atleast_2d, numpy.atleast_3d

   .. rubric:: Example

   >>> n = 2
   >>> arr = np.array([1, 1, 1])
   >>> arr_ = atleast_nd(arr, n)
   >>> import ubelt as ub  # NOQA
   >>> result = ub.repr2(arr_.tolist(), nl=0)
   >>> print(result)
   [[1], [1], [1]]

   .. rubric:: Example

   >>> n = 4
   >>> arr1 = [1, 1, 1]
   >>> arr2 = np.array(0)
   >>> arr3 = np.array([[[[[1]]]]])
   >>> arr1_ = atleast_nd(arr1, n)
   >>> arr2_ = atleast_nd(arr2, n)
   >>> arr3_ = atleast_nd(arr3, n)
   >>> import ubelt as ub  # NOQA
   >>> result1 = ub.repr2(arr1_.tolist(), nl=0)
   >>> result2 = ub.repr2(arr2_.tolist(), nl=0)
   >>> result3 = ub.repr2(arr3_.tolist(), nl=0)
   >>> result = '\n'.join([result1, result2, result3])
   >>> print(result)
   [[[[1]]], [[[1]]], [[[1]]]]
   [[[[0]]]]
   [[[[[1]]]]]

   .. rubric:: Notes

   Extensive benchmarks are in
   kwarray/dev/bench_atleast_nd.py

   These demonstrate that this function is statistically faster than the
   numpy variants, although the difference is small.  On average this
   function takes 480ns versus numpy which takes 790ns.


.. py:function:: boolmask(indices, shape=None)

   Constructs an array of booleans where an item is True if its position is in
   ``indices`` otherwise it is False. This can be viewed as the inverse of
   :func:`numpy.where`.

   :Parameters: * **indices** (*ndarray*) -- list of integer indices
                * **shape** (*int | tuple*) -- length of the returned list. If not specified
                  the minimal possible shape to incoporate all the indices is used.
                  In general, it is best practice to always specify this argument.

   :returns: mask: mask[idx] is True if idx in indices
   :rtype: ndarray[int]

   .. rubric:: Example

   >>> indices = [0, 1, 4]
   >>> mask = boolmask(indices, shape=6)
   >>> assert np.all(mask == [True, True, False, False, True, False])
   >>> mask = boolmask(indices)
   >>> assert np.all(mask == [True, True, False, False, True])

   .. rubric:: Example

   >>> indices = np.array([(0, 0), (1, 1), (2, 1)])
   >>> shape = (3, 3)
   >>> mask = boolmask(indices, shape)
   >>> import ubelt as ub  # NOQA
   >>> result = ub.repr2(mask)
   >>> print(result)
   np.array([[ True, False, False],
             [False,  True, False],
             [False,  True, False]], dtype=np.bool)


.. py:function:: isect_flags(arr, other)

   Check which items in an array intersect with another set of items

   :Parameters: * **arr** (*ndarray*) -- items to check
                * **other** (*Iterable*) -- items to check if they exist in arr

   :returns:

             booleans corresponding to arr indicating if that item is
                 also contained in other.
   :rtype: ndarray

   .. rubric:: Example

   >>> arr = np.array([
   >>>     [1, 2, 3, 4],
   >>>     [5, 6, 3, 4],
   >>>     [1, 1, 3, 4],
   >>> ])
   >>> other = np.array([1, 4, 6])
   >>> mask = isect_flags(arr, other)
   >>> print(mask)
   [[ True False False  True]
    [False  True False  True]
    [ True  True False  True]]


.. py:function:: iter_reduce_ufunc(ufunc, arrs, out=None, default=None)

   constant memory iteration and reduction

   applys ufunc from left to right over the input arrays

   :Parameters: * **ufunc** (*Callable*) -- called on each pair of consecutive ndarrays
                * **arrs** (*Iterator[ndarray]*) -- iterator of ndarrays
                * **default** (*object*) -- return value when iterator is empty

   :returns:     if len(arrs) == 0, returns ``default``
                 if len(arrs) == 1, returns arrs[0],
                 if len(arrs) >= 2, returns
                     ufunc(...ufunc(ufunc(arrs[0], arrs[1]), arrs[2]),...arrs[n-1])
   :rtype: ndarray

   .. rubric:: Example

   >>> arr_list = [
   ...     np.array([0, 1, 2, 3, 8, 9]),
   ...     np.array([4, 1, 2, 3, 4, 5]),
   ...     np.array([0, 5, 2, 3, 4, 5]),
   ...     np.array([1, 1, 6, 3, 4, 5]),
   ...     np.array([0, 1, 2, 7, 4, 5])
   ... ]
   >>> memory = np.array([9, 9, 9, 9, 9, 9])
   >>> gen_memory = memory.copy()
   >>> def arr_gen(arr_list, gen_memory):
   ...     for arr in arr_list:
   ...         gen_memory[:] = arr
   ...         yield gen_memory
   >>> print('memory = %r' % (memory,))
   >>> print('gen_memory = %r' % (gen_memory,))
   >>> ufunc = np.maximum
   >>> res1 = iter_reduce_ufunc(ufunc, iter(arr_list), out=None)
   >>> res2 = iter_reduce_ufunc(ufunc, iter(arr_list), out=memory)
   >>> res3 = iter_reduce_ufunc(ufunc, arr_gen(arr_list, gen_memory), out=memory)
   >>> print('res1       = %r' % (res1,))
   >>> print('res2       = %r' % (res2,))
   >>> print('res3       = %r' % (res3,))
   >>> print('memory     = %r' % (memory,))
   >>> print('gen_memory = %r' % (gen_memory,))
   >>> assert np.all(res1 == res2)
   >>> assert np.all(res2 == res3)


.. py:function:: normalize(arr, mode='linear', alpha=None, beta=None, out=None)

   Rebalance signal values via contrast stretching.

   By default linearly stretches array values to minimum and maximum values.

   :Parameters: * **arr** (*ndarray*) -- array to normalize, usually an image
                * **out** (*ndarray | None*) -- output array. Note, that we will create an
                  internal floating point copy for integer computations.
                * **mode** (*str*) -- either linear or sigmoid.
                * **alpha** (*float*) -- Only used if mode=sigmoid.  Division factor
                  (pre-sigmoid). If unspecified computed as:
                  ``max(abs(old_min - beta), abs(old_max - beta)) / 6.212606``.
                  Note this parameter is sensitive to if the input is a float or
                  uint8 image.
                * **beta** (*float*) -- subtractive factor (pre-sigmoid). This should be the
                  intensity of the most interesting bits of the image, i.e. bring
                  them to the center (0) of the distribution.
                  Defaults to ``(max - min) / 2``.  Note this parameter is sensitive
                  to if the input is a float or uint8 image.

   .. rubric:: References

   https://en.wikipedia.org/wiki/Normalization_(image_processing)

   .. rubric:: Example

   >>> raw_f = np.random.rand(8, 8)
   >>> norm_f = normalize(raw_f)

   >>> raw_f = np.random.rand(8, 8) * 100
   >>> norm_f = normalize(raw_f)
   >>> assert isclose(norm_f.min(), 0)
   >>> assert isclose(norm_f.max(), 1)

   >>> raw_u = (np.random.rand(8, 8) * 255).astype(np.uint8)
   >>> norm_u = normalize(raw_u)

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:kwimage)
   >>> import kwimage
   >>> arr = kwimage.grab_test_image('lowcontrast')
   >>> arr = kwimage.ensure_float01(arr)
   >>> norms = {}
   >>> norms['arr'] = arr.copy()
   >>> norms['linear'] = normalize(arr, mode='linear')
   >>> norms['sigmoid'] = normalize(arr, mode='sigmoid')
   >>> # xdoctest: +REQUIRES(--show)
   >>> import kwplot
   >>> kwplot.autompl()
   >>> kwplot.figure(fnum=1, doclf=True)
   >>> pnum_ = kwplot.PlotNums(nSubplots=len(norms))
   >>> for key, img in norms.items():
   >>>     kwplot.imshow(img, pnum=pnum_(), title=key)

   Benchmark:
       # Our method is faster than standard in-line implementations.

       import timerit
       ti = timerit.Timerit(100, bestof=10, verbose=2, unit='ms')
       arr = kwimage.grab_test_image('lowcontrast', dsize=(512, 512))

       print('--- uint8 ---')
       arr = ensure_float01(arr)
       out = arr.copy()
       for timer in ti.reset('naive1-float'):
           with timer:
               (arr - arr.min()) / (arr.max() - arr.min())

       import timerit
       for timer in ti.reset('simple-float'):
           with timer:
               max_ = arr.max()
               min_ = arr.min()
               result = (arr - min_) / (max_ - min_)

       for timer in ti.reset('normalize-float'):
           with timer:
               normalize(arr)

       for timer in ti.reset('normalize-float-inplace'):
           with timer:
               normalize(arr, out=out)

       print('--- float ---')
       arr = ensure_uint255(arr)
       out = arr.copy()
       for timer in ti.reset('naive1-uint8'):
           with timer:
               (arr - arr.min()) / (arr.max() - arr.min())

       import timerit
       for timer in ti.reset('simple-uint8'):
           with timer:
               max_ = arr.max()
               min_ = arr.min()
               result = (arr - min_) / (max_ - min_)

       for timer in ti.reset('normalize-uint8'):
           with timer:
               normalize(arr)

       for timer in ti.reset('normalize-uint8-inplace'):
           with timer:
               normalize(arr, out=out)

   Ignore:
       globals().update(xdev.get_func_kwargs(normalize))


.. py:function:: ensure_rng(rng, api='numpy')

   Coerces input into a random number generator.

   This function is useful for ensuring that your code uses a controlled
   internal random state that is independent of other modules.

   If the input is None, then a global random state is returned.

   If the input is a numeric value, then that is used as a seed to construct a
   random state.

   If the input is a random number generator, then another random number
   generator with the same state is returned. Depending on the api, this
   random state is either return as-is, or used to construct an equivalent
   random state with the requested api.

   :Parameters: * **rng** (*int | float | numpy.random.RandomState | random.Random | None*) -- if None, then defaults to the global rng. Otherwise this can
                  be an integer or a RandomState class
                * **api** (*str, default='numpy'*) -- specify the type of random number
                  generator to use. This can either be 'numpy' for a
                  :class:`numpy.random.RandomState` object or 'python' for a
                  :class:`random.Random` object.

   :returns:

             rng -
                 either a numpy or python random number generator, depending on the
                 setting of ``api``.
   :rtype: (numpy.random.RandomState | random.Random)

   .. rubric:: Example

   >>> rng = ensure_rng(None)
   >>> ensure_rng(0).randint(0, 1000)
   684
   >>> ensure_rng(np.random.RandomState(1)).randint(0, 1000)
   37

   .. rubric:: Example

   >>> num = 4
   >>> print('--- Python as PYTHON ---')
   >>> py_rng = random.Random(0)
   >>> pp_nums = [py_rng.random() for _ in range(num)]
   >>> print(pp_nums)
   >>> print('--- Numpy as PYTHON ---')
   >>> np_rng = ensure_rng(random.Random(0), api='numpy')
   >>> np_nums = [np_rng.rand() for _ in range(num)]
   >>> print(np_nums)
   >>> print('--- Numpy as NUMPY---')
   >>> np_rng = np.random.RandomState(seed=0)
   >>> nn_nums = [np_rng.rand() for _ in range(num)]
   >>> print(nn_nums)
   >>> print('--- Python as NUMPY---')
   >>> py_rng = ensure_rng(np.random.RandomState(seed=0), api='python')
   >>> pn_nums = [py_rng.random() for _ in range(num)]
   >>> print(pn_nums)
   >>> assert np_nums == pp_nums
   >>> assert pn_nums == nn_nums

   .. rubric:: Example

   >>> # Test that random modules can be coerced
   >>> import random
   >>> import numpy as np
   >>> ensure_rng(random, api='python')
   >>> ensure_rng(random, api='numpy')
   >>> ensure_rng(np.random, api='python')
   >>> ensure_rng(np.random, api='numpy')

   Ignore:
       >>> np.random.seed(0)
       >>> np.random.randint(0, 10000)
       2732
       >>> np.random.seed(0)
       >>> np.random.mtrand._rand.randint(0, 10000)
       2732
       >>> np.random.seed(0)
       >>> ensure_rng(None).randint(0, 10000)
       2732
       >>> np.random.randint(0, 10000)
       9845
       >>> ensure_rng(None).randint(0, 10000)
       3264


.. py:function:: random_combinations(items, size, num=None, rng=None)

   Yields ``num`` combinations of length ``size`` from items in random order

   :Parameters: * **items** (*List*) -- pool of items to choose from
                * **size** (*int*) -- number of items in each combination
                * **num** (*None, default=None*) -- number of combinations to generate
                * **rng** (*int | RandomState, default=None*) -- seed or random number generator

   :Yields: *Tuple* -- a random combination of ``items`` of length ``size``.

   .. rubric:: Example

   >>> import ubelt as ub
   >>> items = list(range(10))
   >>> size = 3
   >>> num = 5
   >>> rng = 0
   >>> # xdoctest: +IGNORE_WANT
   >>> combos = list(random_combinations(items, size, num, rng))
   >>> print('combos = {}'.format(ub.repr2(combos, nl=1)))
   combos = [
       (0, 6, 9),
       (4, 7, 8),
       (4, 6, 7),
       (2, 3, 5),
       (1, 2, 4),
   ]

   .. rubric:: Example

   >>> import ubelt as ub
   >>> items = list(zip(range(10), range(10)))
   >>> # xdoctest: +IGNORE_WANT
   >>> combos = list(random_combinations(items, 3, num=5, rng=0))
   >>> print('combos = {}'.format(ub.repr2(combos, nl=1)))
   combos = [
       ((0, 0), (6, 6), (9, 9)),
       ((4, 4), (7, 7), (8, 8)),
       ((4, 4), (6, 6), (7, 7)),
       ((2, 2), (3, 3), (5, 5)),
       ((1, 1), (2, 2), (4, 4)),
   ]


.. py:function:: random_product(items, num=None, rng=None)

   Yields ``num`` items from the cartesian product of items in a random order.

   :Parameters: * **items** (*List[Sequence]*) -- items to get caresian product of packed in a list or tuple.
                  (note this deviates from api of :func:`itertools.product`)
                * **num** (*int, default=None*) -- maximum number of items to generate. If None, all
                * **rng** (*random.Random | np.random.RandomState | int*) -- random number generator

   :Yields: *Tuple* -- a random item in the cartesian product

   .. rubric:: Example

   >>> import ubelt as ub
   >>> items = [(1, 2, 3), (4, 5, 6, 7)]
   >>> rng = 0
   >>> # xdoctest: +IGNORE_WANT
   >>> products = list(random_product(items, rng=0))
   >>> print(ub.repr2(products, nl=0))
   [(3, 4), (1, 7), (3, 6), (2, 7),... (1, 6), (2, 5), (2, 4)]
   >>> products = list(random_product(items, num=3, rng=0))
   >>> print(ub.repr2(products, nl=0))
   [(3, 4), (1, 7), (3, 6)]

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(--profile)
   >>> rng = ensure_rng(0)
   >>> items = [np.array([15, 14]), np.array([27, 26]),
   >>>          np.array([21, 22]), np.array([32, 31])]
   >>> num = 2
   >>> for _ in range(100):
   >>>     list(random_product(items, num=num, rng=rng))


.. py:function:: seed_global(seed, offset=0)

   Seeds the python, numpy, and torch global random states

   :Parameters: * **seed** (*int*) -- seed to use
                * **offset** (*int, optional*) -- if specified, uses a different seed for each
                  global random state separated by this offset.


.. py:function:: shuffle(items, rng=None)

   Shuffles a list inplace and then returns it for convinience

   :Parameters: * **items** (*list or ndarray*) -- list to shuffle
                * **rng** (*RandomState or int*) -- seed or random number gen

   :returns: this is the input, but returned for convinience
   :rtype: list

   .. rubric:: Example

   >>> list1 = [1, 2, 3, 4, 5, 6]
   >>> list2 = shuffle(list(list1), rng=1)
   >>> assert list1 != list2
   >>> result = str(list2)
   >>> print(result)
   [3, 2, 5, 1, 4, 6]


.. py:function:: embed_slice(slices, data_dims, pad=None)

   Embeds a "padded-slice" inside known data dimension.

   Returns the valid data portion of the slice with extra padding for regions
   outside of the available dimension.

   Given a slices for each dimension, image dimensions, and a padding get the
   corresponding slice from the image and any extra padding needed to achieve
   the requested window size.

   .. todo:: - [ ] Add the option to return the inverse slice

   :Parameters: * **slices** (*Tuple[slice, ...]*) -- a tuple of slices for to apply to data data dimension.
                * **data_dims** (*Tuple[int, ...]*) -- n-dimension data sizes (e.g. 2d height, width)
                * **pad** (*List[int|Tuple]*) -- extra pad applied to (left and right) / (both) sides of each slice
                  dim

   :returns:

                 data_slice - Tuple[slice] a slice that can be applied to an array
                     with with shape `data_dims`. This slice will not correspond to
                     the full window size if the requested slice is out of bounds.
                 extra_padding - extra padding needed after slicing to achieve
                     the requested window size.
   :rtype: Tuple

   .. rubric:: Example

   >>> # Case where slice is inside the data dims on left edge
   >>> from kwarray.util_slices import *  # NOQA
   >>> slices = (slice(0, 10), slice(0, 10))
   >>> data_dims  = [300, 300]
   >>> pad        = [10, 5]
   >>> a, b = embed_slice(slices, data_dims, pad)
   >>> print('data_slice = {!r}'.format(a))
   >>> print('extra_padding = {!r}'.format(b))
   data_slice = (slice(0, 20, None), slice(0, 15, None))
   extra_padding = [(10, 0), (5, 0)]

   .. rubric:: Example

   >>> # Case where slice is bigger than the image
   >>> slices = (slice(-10, 400), slice(-10, 400))
   >>> data_dims  = [300, 300]
   >>> pad        = [10, 5]
   >>> a, b = embed_slice(slices, data_dims, pad)
   >>> print('data_slice = {!r}'.format(a))
   >>> print('extra_padding = {!r}'.format(b))
   data_slice = (slice(0, 300, None), slice(0, 300, None))
   extra_padding = [(20, 110), (15, 105)]

   .. rubric:: Example

   >>> # Case where slice is inside than the image
   >>> slices = (slice(10, 40), slice(10, 40))
   >>> data_dims  = [300, 300]
   >>> pad        = None
   >>> a, b = embed_slice(slices, data_dims, pad)
   >>> print('data_slice = {!r}'.format(a))
   >>> print('extra_padding = {!r}'.format(b))
   data_slice = (slice(10, 40, None), slice(10, 40, None))
   extra_padding = [(0, 0), (0, 0)]


.. py:function:: padded_slice(data, slices, pad=None, padkw=None, return_info=False)

   Allows slices with out-of-bound coordinates. Any out of bounds coordinate
   will be sampled via padding.

   :Parameters: * **data** (*Sliceable[T]*) -- data to slice into. Any channels must be the last dimension.
                * **slices** (*slice | Tuple[slice, ...]*) -- slice for each dimensions
                * **ndim** (*int*) -- number of spatial dimensions
                * **pad** (*List[int|Tuple]*) -- additional padding of the slice
                * **padkw** (*Dict*) -- if unspecified defaults to ``{'mode': 'constant'}``
                * **return_info** (*bool, default=False*) -- if True, return extra information
                  about the transform.

   .. note::

      Negative slices have a different meaning here then they usually do.
      Normally, they indicate a wrap-around or a reversed stride, but here
      they index into out-of-bounds space (which depends on the pad mode).
      For example a slice of -2:1 literally samples two pixels to the left of
      the data and one pixel from the data, so you get two padded values and
      one data value.

   SeeAlso:
       embed_slice - finds the embedded slice and padding

   :returns:

                 data_sliced: subregion of the input data (possibly with padding,
                     depending on if the original slice went out of bounds)


             Tuple[Sliceable, Dict] :
                 data_sliced : as above

                 transform : information on how to return to the original coordinates

                     Currently a dict containing:
                         st_dims: a list indicating the low and high space-time
                             coordinate values of the returned data slice.

                     The structure of this dictionary mach change in the future
   :rtype: Sliceable

   .. rubric:: Example

   >>> data = np.arange(5)
   >>> slices = [slice(-2, 7)]

   >>> data_sliced = padded_slice(data, slices)
   >>> print(ub.repr2(data_sliced, with_dtype=False))
   np.array([0, 0, 0, 1, 2, 3, 4, 0, 0])

   >>> data_sliced = padded_slice(data, slices, pad=(3, 3))
   >>> print(ub.repr2(data_sliced, with_dtype=False))
   np.array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 0, 0, 0, 0, 0])

   >>> data_sliced = padded_slice(data, slice(3, 4), pad=[(1, 0)])
   >>> print(ub.repr2(data_sliced, with_dtype=False))
   np.array([2, 3])


.. py:class:: SlidingWindow(shape, window, overlap=None, stride=None, keepbound=False, allow_overshoot=False)

   Bases: :py:obj:`ubelt.NiceRepr`

   Slide a window of a certain shape over an array with a larger shape.

   This can be used for iterating over a grid of sub-regions of 2d-images,
   3d-volumes, or any n-dimensional array.

   Yields slices of shape `window` that can be used to index into an array
   with shape `shape` via numpy / torch fancy indexing. This allows for fast
   fast iteration over subregions of a larger image. Because we generate a
   grid-basis using only shapes, the larger image does not need to be in
   memory as long as its width/height/depth/etc...

   :Parameters: * **shape** (*Tuple[int, ...]*) -- shape of source array to slide across.
                * **window** (*Tuple[int, ...]*) -- shape of window that will be slid over the
                  larger image.
                * **overlap** (*float, default=0*) -- a number between 0 and 1 indicating the
                  fraction of overlap that parts will have. Specifying this is
                  mutually exclusive with `stride`.  Must be `0 <= overlap < 1`.
                * **stride** (*int, default=None*) -- the number of cells (pixels) moved on each
                  step of the window. Mutually exclusive with overlap.
                * **keepbound** (*bool, default=False*) -- if True, a non-uniform stride will be
                  taken to ensure that the right / bottom of the image is returned as
                  a slice if needed. Such a slice will not obey the overlap
                  constraints.  (Defaults to False)
                * **allow_overshoot** (*bool, default=False*) -- if False, we will raise an
                  error if the window doesn't slide perfectly over the input shape.

   :ivar basis_shape - shape of the grid corresponding to the number of strides:
                                                                                 the sliding window will take.
   :ivar basis_slices - slices that will be taken in every dimension:


   :Yields: *Tuple[slice, ...]* --

            slices used for numpy indexing, the number of slices
                in the tuple

   .. rubric:: Notes

   For each dimension, we generate a basis (which defines a grid), and we
   slide over that basis.

   .. rubric:: Example

   >>> from kwarray.util_slider import *  # NOQA
   >>> shape = (10, 10)
   >>> window = (5, 5)
   >>> self = SlidingWindow(shape, window)
   >>> for i, index in enumerate(self):
   >>>     print('i={}, index={}'.format(i, index))
   i=0, index=(slice(0, 5, None), slice(0, 5, None))
   i=1, index=(slice(0, 5, None), slice(5, 10, None))
   i=2, index=(slice(5, 10, None), slice(0, 5, None))
   i=3, index=(slice(5, 10, None), slice(5, 10, None))

   .. rubric:: Example

   >>> from kwarray.util_slider import *  # NOQA
   >>> shape = (16, 16)
   >>> window = (4, 4)
   >>> self = SlidingWindow(shape, window, overlap=(.5, .25))
   >>> print('self.stride = {!r}'.format(self.stride))
   self.stride = [2, 3]
   >>> list(ub.chunks(self.grid, 5))
   [[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4)],
    [(1, 0), (1, 1), (1, 2), (1, 3), (1, 4)],
    [(2, 0), (2, 1), (2, 2), (2, 3), (2, 4)],
    [(3, 0), (3, 1), (3, 2), (3, 3), (3, 4)],
    [(4, 0), (4, 1), (4, 2), (4, 3), (4, 4)],
    [(5, 0), (5, 1), (5, 2), (5, 3), (5, 4)],
    [(6, 0), (6, 1), (6, 2), (6, 3), (6, 4)]]

   .. rubric:: Example

   >>> # Test shapes that dont fit
   >>> # When the window is bigger than the shape, the left-aligned slices
   >>> # are returend.
   >>> self = SlidingWindow((3, 3), (12, 12), allow_overshoot=True, keepbound=True)
   >>> print(list(self))
   [(slice(0, 12, None), slice(0, 12, None))]
   >>> print(list(SlidingWindow((3, 3), None, allow_overshoot=True, keepbound=True)))
   [(slice(0, 3, None), slice(0, 3, None))]
   >>> print(list(SlidingWindow((3, 3), (None, 2), allow_overshoot=True, keepbound=True)))
   [(slice(0, 3, None), slice(0, 2, None)), (slice(0, 3, None), slice(1, 3, None))]

   .. py:method:: __nice__(self)


   .. py:method:: _compute_stride(self, overlap, stride, shape, window)

      Ensures that stride hasoverlap the correct shape.  If stride is not
      provided, compute stride from desired overlap.


   .. py:method:: __len__(self)


   .. py:method:: _iter_basis_frac(self)


   .. py:method:: __iter__(self)


   .. py:method:: __getitem__(self, index)

      Get a specific item by its flat (raveled) index

      .. rubric:: Example

      >>> from kwarray.util_slider import *  # NOQA
      >>> window = (10, 10)
      >>> shape = (20, 20)
      >>> self = SlidingWindow(shape, window, stride=5)
      >>> itered_items = list(self)
      >>> assert len(itered_items) == len(self)
      >>> indexed_items = [self[i] for i in range(len(self))]
      >>> assert itered_items[0] == self[0]
      >>> assert itered_items[-1] == self[-1]
      >>> assert itered_items == indexed_items


   .. py:method:: grid(self)
      :property:

      Generate indices into the "basis" slice for each dimension.
      This enumerates the nd indices of the grid.

      :Yields: Tuple[int, ...]


   .. py:method:: slices(self)
      :property:

      Generate slices for each window (equivalent to iter(self))

      .. rubric:: Example

      >>> shape = (220, 220)
      >>> window = (10, 10)
      >>> self = SlidingWindow(shape, window, stride=5)
      >>> list(self)[41:45]
      [(slice(0, 10, None), slice(205, 215, None)),
       (slice(0, 10, None), slice(210, 220, None)),
       (slice(5, 15, None), slice(0, 10, None)),
       (slice(5, 15, None), slice(5, 15, None))]
      >>> print('self.overlap = {!r}'.format(self.overlap))
      self.overlap = [0.5, 0.5]


   .. py:method:: centers(self)
      :property:

      Generate centers of each window

      :Yields: *Tuple[float, ...]* -- the center coordinate of the slice

      .. rubric:: Example

      >>> shape = (4, 4)
      >>> window = (3, 3)
      >>> self = SlidingWindow(shape, window, stride=1)
      >>> list(zip(self.centers, self.slices))
      [((1.0, 1.0), (slice(0, 3, None), slice(0, 3, None))),
       ((1.0, 2.0), (slice(0, 3, None), slice(1, 4, None))),
       ((2.0, 1.0), (slice(1, 4, None), slice(0, 3, None))),
       ((2.0, 2.0), (slice(1, 4, None), slice(1, 4, None)))]
      >>> shape = (3, 3)
      >>> window = (2, 2)
      >>> self = SlidingWindow(shape, window, stride=1)
      >>> list(zip(self.centers, self.slices))
      [((0.5, 0.5), (slice(0, 2, None), slice(0, 2, None))),
       ((0.5, 1.5), (slice(0, 2, None), slice(1, 3, None))),
       ((1.5, 0.5), (slice(1, 3, None), slice(0, 2, None))),
       ((1.5, 1.5), (slice(1, 3, None), slice(1, 3, None)))]


.. py:class:: Stitcher(stitcher, shape, device='numpy')

   Bases: :py:obj:`ubelt.NiceRepr`

   Stitches multiple possibly overlapping slices into a larger array.

   This is used to invert the SlidingWindow.  For semenatic segmentation the
   patches are probability chips. Overlapping chips are averaged together.

   :Parameters: **shape** (*tuple*) -- dimensions of the large image that will be created from
                the smaller pixels or patches.

   .. todo::

      - [ ] Look at the old "add_fast" code in the netharn version and see if
            it is worth porting. This code is kept in the dev folder in
            ../dev/_dev_slider.py

   .. rubric:: Example

   >>> from kwarray.util_slider import *  # NOQA
   >>> import sys
   >>> # Build a high resolution image and slice it into chips
   >>> highres = np.random.rand(5, 200, 200).astype(np.float32)
   >>> target_shape = (1, 50, 50)
   >>> slider = SlidingWindow(highres.shape, target_shape, overlap=(0, .5, .5))
   >>> # Show how Sticher can be used to reconstruct the original image
   >>> stitcher = Stitcher(slider.input_shape)
   >>> for sl in list(slider):
   ...     chip = highres[sl]
   ...     stitcher.add(sl, chip)
   >>> assert stitcher.weights.max() == 4, 'some parts should be processed 4 times'
   >>> recon = stitcher.finalize()

   .. py:method:: __nice__(stitcher)


   .. py:method:: add(stitcher, indices, patch, weight=None)

      Incorporate a new (possibly overlapping) patch or pixel using a
      weighted sum.

      :Parameters: * **indices** (*slice or tuple*) -- typically a Tuple[slice] of pixels or a
                     single pixel, but this can be any numpy fancy index.
                   * **patch** (*ndarray*) -- data to patch into the bigger image.
                   * **weight** (*float or ndarray*) -- weight of this patch (default to 1.0)


   .. py:method:: average(stitcher)

      Averages out contributions from overlapping adds using weighted average

      :returns: ndarray: the stitched image
      :rtype: out


   .. py:method:: finalize(stitcher, indices=None)

      Averages out contributions from overlapping adds

      :Parameters: **indices** (*None | slice | tuple*) -- if None, finalize the entire
                   block, otherwise only finalize a subregion.

      :returns: ndarray: the stitched image
      :rtype: final


.. py:function:: one_hot_embedding(labels, num_classes, dim=1)

   Embedding labels to one-hot form.

   :Parameters: * **labels** -- (LongTensor) class labels, sized [N,].
                * **num_classes** -- (int) number of classes.
                * **dim** (*int*) -- dimension which will be created, if negative

   :returns: encoded labels, sized [N,#classes].
   :rtype: Tensor

   .. rubric:: References

   https://discuss.pytorch.org/t/convert-int-into-one-hot-format/507/4

   .. rubric:: Example

   >>> # each element in target has to have 0 <= value < C
   >>> # xdoctest: +REQUIRES(module:torch)
   >>> labels = torch.LongTensor([0, 0, 1, 4, 2, 3])
   >>> num_classes = max(labels) + 1
   >>> t = one_hot_embedding(labels, num_classes)
   >>> assert all(row[y] == 1 for row, y in zip(t.numpy(), labels.numpy()))
   >>> import ubelt as ub
   >>> print(ub.repr2(t.numpy().tolist()))
   [
       [1.0, 0.0, 0.0, 0.0, 0.0],
       [1.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 1.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 1.0],
       [0.0, 0.0, 1.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 1.0, 0.0],
   ]
   >>> t2 = one_hot_embedding(labels.numpy(), num_classes)
   >>> assert np.all(t2 == t.numpy())
   >>> if torch.cuda.is_available():
   >>>     t3 = one_hot_embedding(labels.to(0), num_classes)
   >>>     assert np.all(t3.cpu().numpy() == t.numpy())

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:torch)
   >>> nC = num_classes = 3
   >>> labels = (torch.rand(10, 11, 12) * nC).long()
   >>> assert one_hot_embedding(labels, nC, dim=0).shape == (3, 10, 11, 12)
   >>> assert one_hot_embedding(labels, nC, dim=1).shape == (10, 3, 11, 12)
   >>> assert one_hot_embedding(labels, nC, dim=2).shape == (10, 11, 3, 12)
   >>> assert one_hot_embedding(labels, nC, dim=3).shape == (10, 11, 12, 3)
   >>> labels = (torch.rand(10, 11) * nC).long()
   >>> assert one_hot_embedding(labels, nC, dim=0).shape == (3, 10, 11)
   >>> assert one_hot_embedding(labels, nC, dim=1).shape == (10, 3, 11)
   >>> labels = (torch.rand(10) * nC).long()
   >>> assert one_hot_embedding(labels, nC, dim=0).shape == (3, 10)
   >>> assert one_hot_embedding(labels, nC, dim=1).shape == (10, 3)


.. py:function:: one_hot_lookup(data, indices)

   Return value of a particular column for each row in data.

   Each item in labels corresonds to a row in ``data``. Returns the index
   specified at each row.

   :Parameters: * **data** (*ArrayLike*) -- N x C float array of values
                * **indices** (*ArrayLike*) -- N integer array between 0 and C.
                  This is an column index for each row in ``data``.

   :returns: the selected probability for each row
   :rtype: ArrayLike

   .. rubric:: Notes

   This is functionally equivalent to
   ``[row[c] for row, c in zip(data, indices)]`` except that it is
   works with pure matrix operations.

   .. todo::

      - [ ] Allow the user to specify which dimension indices should be
            zipped over. By default it should be dim=0
      
      - [ ] Allow the user to specify which dimension indices should select
            from. By default it should be dim=1.

   .. rubric:: Example

   >>> from kwarray.util_torch import *  # NOQA
   >>> data = np.array([
   >>>     [0, 1, 2],
   >>>     [3, 4, 5],
   >>>     [6, 7, 8],
   >>>     [9, 10, 11],
   >>> ])
   >>> indices = np.array([0, 1, 2, 1])
   >>> res = one_hot_lookup(data, indices)
   >>> print('res = {!r}'.format(res))
   res = array([ 0,  4,  8, 10])
   >>> alt = np.array([row[c] for row, c in zip(data, indices)])
   >>> assert np.all(alt == res)

   .. rubric:: Example

   >>> # xdoctest: +REQUIRES(module:torch)
   >>> import torch
   >>> data = torch.from_numpy(np.array([
   >>>     [0, 1, 2],
   >>>     [3, 4, 5],
   >>>     [6, 7, 8],
   >>>     [9, 10, 11],
   >>> ]))
   >>> indices = torch.from_numpy(np.array([0, 1, 2, 1])).long()
   >>> res = one_hot_lookup(data, indices)
   >>> print('res = {!r}'.format(res))
   res = tensor([ 0,  4,  8, 10]...)
   >>> alt = torch.LongTensor([row[c] for row, c in zip(data, indices)])
   >>> assert torch.all(alt == res)

   Ignore:
       >>> # xdoctest: +REQUIRES(module:torch, module:onnx, module:onnx_tf)
       >>> # Test if this converts to ONNX
       >>> from kwarray.util_torch import *  # NOQA
       >>> import torch.onnx
       >>> import io
       >>> import onnx
       >>> import onnx_tf.backend
       >>> import numpy as np
       >>> data = torch.from_numpy(np.array([
       >>>     [0, 1, 2],
       >>>     [3, 4, 5],
       >>>     [6, 7, 8],
       >>>     [9, 10, 11],
       >>> ]))
       >>> indices = torch.from_numpy(np.array([0, 1, 2, 1])).long()
       >>> class TFConvertWrapper(torch.nn.Module):
       >>>     def forward(self, data, indices):
       >>>         return one_hot_lookup(data, indices)
       >>> ###
       >>> # Test the ONNX export
       >>> wrapped = TFConvertWrapper()
       >>> onnx_file = io.BytesIO()
       >>> torch.onnx.export(
       >>>     wrapped, tuple([data, indices]),
       >>>     input_names=['data', 'indices'],
       >>>     output_names=['out'],
       >>>     f=onnx_file,
       >>>     opset_version=11,
       >>>     verbose=1,
       >>> )
       >>> onnx_file.seek(0)
       >>> onnx_model = onnx.load(onnx_file)
       >>> onnx_tf_model = onnx_tf.backend.prepare(onnx_model)
       >>> # Test that the resulting graph tensors are concretely sized.
       >>> import tensorflow as tf
       >>> onnx_gd = onnx_tf_model.graph.as_graph_def()
       >>> output_tensors = tf.import_graph_def(
       >>>     onnx_gd,
       >>>     input_map={},
       >>>     return_elements=[onnx_tf_model.tensor_dict[ol].name for ol in onnx_tf_model.outputs]
       >>> )
       >>> assert all(isinstance(d.value, int) for t in output_tensors for d in t.shape)
       >>> tf_outputs = onnx_tf_model.run([data, indices])
       >>> pt_outputs = wrapped(data, indices)
       >>> print('tf_outputs = {!r}'.format(tf_outputs))
       >>> print('pt_outputs = {!r}'.format(pt_outputs))
       >>> ###
       >>> # Test if data is more than 2D
       >>> shape = (4, 3, 8)
       >>> data = torch.arange(int(np.prod(shape))).view(*shape).float()
       >>> indices = torch.from_numpy(np.array([0, 1, 2, 1])).long()
       >>> onnx_file = io.BytesIO()
       >>> torch.onnx.export(
       >>>     wrapped, tuple([data, indices]),
       >>>     input_names=['data', 'indices'],
       >>>     output_names=['out'],
       >>>     f=onnx_file,
       >>>     opset_version=11,
       >>>     verbose=1,
       >>> )
       >>> onnx_file.seek(0)
       >>> onnx_model = onnx.load(onnx_file)
       >>> onnx_tf_model = onnx_tf.backend.prepare(onnx_model)
       >>> # Test that the resulting graph tensors are concretely sized.
       >>> import tensorflow as tf
       >>> onnx_gd = onnx_tf_model.graph.as_graph_def()
       >>> output_tensors = tf.import_graph_def(
       >>>     onnx_gd,
       >>>     input_map={},
       >>>     return_elements=[onnx_tf_model.tensor_dict[ol].name for ol in onnx_tf_model.outputs]
       >>> )
       >>> assert all(isinstance(d.value, int) for t in output_tensors for d in t.shape)
       >>> tf_outputs = onnx_tf_model.run([data, indices])
       >>> pt_outputs = wrapped(data, indices)
       >>> print('tf_outputs = {!r}'.format(tf_outputs))
       >>> print('pt_outputs = {!r}'.format(pt_outputs))