kwarray.distributions module

Defines data structures for efficient repeated sampling of specific distributions (e.g. Normal, Uniform, Binomial) with specific parameters.

Inspired by ~/code/imgaug/imgaug/parameters.py

Similar Libraries:

Todo

  • [ ] change sample shape to just a single num.

  • [ ] Some Distributions will output vectors. Maybe we could just postpend the dimensions?

  • [ ] Expose as kwstats?

  • [ ] Improve coerce syntax for concise distribution specification

References

https://stackoverflow.com/questions/21100716/fast-arbitrary-distribution-random-sampling https://stackoverflow.com/questions/4265988/generate-random-numbers-with-a-given-numerical-distribution https://codereview.stackexchange.com/questions/196286/inverse-transform-sampling

class kwarray.distributions.Value(default=None, min=None, max=None, help=None, constraints=None, type=None, name=None)[source]

Bases: NiceRepr

Container for class __params__ values.

Used to store metadata about distribution arguments, including default values, numeric constraints, typing, and help text.

Example

>>> from kwarray.distributions import *  # NOQA
>>> self = Value(43.5)
>>> print(Value(name='lucy'))
>>> print(Value(name='jeff', default=1))
>>> self = Value(name='fred', default=1.0)
>>> print('self = {}'.format(ub.urepr(self, nl=1)))
>>> print(Value(name='bob', default=1.0, min=-5, max=5))
>>> print(Value(name='alice', default=1.0, min=-5))
sample(rng)[source]

Get a random value for this parameter.

kwarray.distributions._issubclass2(child, parent)[source]

Uses string comparisons to avoid ipython reload errors. Much less robust though.

kwarray.distributions._isinstance2(obj, cls)[source]

Internal hacked version is isinstance for debugging

obj = self cls = distributions.Distribution

child = obj.__class__ parent = cls

class kwarray.distributions.Parameterized[source]

Bases: NiceRepr

Keeps track of all registered params and classes with registered params

_setparam(key, value)[source]
_setchild(key, value)[source]
children()[source]
seed(rng=None)[source]
parameters()[source]

Returns parameters in this object and its children

_body_str()[source]
idstr(nl=None, thresh=80)[source]

Example

>>> # xdoctest: +REQUIRES(module:scipy)
>>> self = TruncNormal()
>>> self.idstr()
>>> #
>>> #
>>> class Dummy(Distribution):
>>>     def __init__(self):
>>>         super(Dummy, self).__init__()
>>>         self._setparam('a', 3)
>>>         self.b = Normal()
>>>         self.c = Uniform()
>>> self = Dummy()
>>> print(self.idstr())
>>> #
>>> class Tail5(Distribution):
>>>     def __init__(self):
>>>         super(Tail5, self).__init__()
>>>         self._setparam('a_parameter', 3)
>>>         for i in range(5):
>>>             self._setparam(chr(i + 97), i)
>>> #
>>> class Tail6(Distribution):
>>>     def __init__(self):
>>>         super(Tail6, self).__init__()
>>>         for i in range(9):
>>>             self._setparam(chr(i + 97) + '_parameter', i)
>>> #
>>> class Dummy2(Distribution):
>>>     def __init__(self):
>>>         super(Dummy2, self).__init__()
>>>         self._setparam('x', 3)
>>>         self._setparam('y', 3)
>>>         self.d = Dummy()
>>>         self.f = Tail6()
>>>         self.y = Tail5()
>>> self = Dummy2()
>>> print(self.idstr())
>>> print(ub.urepr(self.json_id()))
json_id()[source]
_make_body(self_part, child_part, nl=None, thresh=80)[source]
class kwarray.distributions.ParameterizedList(items)[source]

Bases: Parameterized

Example

>>> from kwarray import distributions as stoch
>>> self1 = stoch.ParameterizedList([
>>>     stoch.Normal(),
>>>     stoch.Uniform(),
>>> ])
>>> print(self1.idstr())
>>> self = stoch.ParameterizedList([stoch.ParameterizedList([
>>>     stoch.Normal(),
>>>     stoch.Uniform(),
>>>     self1,
>>> ])])
>>> print(self.idstr())
>>> print(self.idstr(0))
_setparam(key, value)[source]
append(item)[source]
idstr(nl=None, thresh=80)[source]
class kwarray.distributions._BinOpMixin[source]

Bases: object

Allows binary operations to be performed on distributions to create composed distributions.

int()[source]
round(ndigits=None)[source]
clip(a_min=None, a_max=None)[source]
log()[source]
log10()[source]
exp()[source]
sqrt()[source]
abs()[source]
class kwarray.distributions._RBinOpMixin[source]

Bases: _BinOpMixin

https://docs.python.org/3/reference/datamodel.html

class kwarray.distributions.Distribution(*args, **kwargs)[source]

Bases: Parameterized, _RBinOpMixin

Base class for all distributions.

There are 3 main subtypes:

ContinuousDistribution DiscreteDistribution MixedDistribution

Note

In [DiscVsCont] notes that there are only 3 types of random variables: discrete, continuous, or mixed. And these types are mutually exclusive.

Note

When inheriting from this class, you typically do not need to define an __init__ method. Instead, overwrite the __params__ class attribute with an OrderedDict[str, Value] to indicate what the signature of the __init__ method should be. This allows for (1) concise expression of new distributions and (2) for new distributions to inherit a random classmethod that works according to constraints specified in each parameter Value.

If you do overwrite __init__, be sure to call super().

References

seed(rng=None)[source]
sample(*shape)[source]
classmethod random(rng=None)[source]

Returns a random distribution

Parameters:

rng (int | float | None | numpy.random.RandomState | random.Random) – random coercable

CommandLine

xdoctest -m /home/joncrall/code/kwarray/kwarray/distributions.py Distribution.random --show

Example

>>> # xdoctest: +REQUIRES(module:scipy)
>>> from kwarray.distributions import *  # NOQA
>>> self = Distribution.random()
>>> print('self = {!r}'.format(self))
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot('0.001s', bins=256)
>>> kwplot.show_if_requested()
_images/fig_kwarray_distributions_Distribution_random_002.jpeg
classmethod coerce(arg, rng=None)[source]
classmethod cast(arg)[source]
classmethod seeded(rng=0)[source]
plot(n='0.01s', bins='auto', stat='count', color=None, kde=True, ax=None, **kwargs)[source]

Plots n samples from the distribution.

Parameters:
  • bins (int | List[Number] | str) – number of bins, bin edges, or special numpy method for finding the number of bins.

  • stat (str) – density, count, probability, frequency

  • **kwargs – other args passed to seaborn.histplot()

Example

>>> from kwarray.distributions import Normal  # NOQA
>>> self = Normal()
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> self.plot(n=1000)
_show(n, bins=None, ax=None, color=None, label=None)[source]

plot samples monte-carlo style

kwarray.distributions._coerce_timedelta(data)[source]
kwarray.distributions._generate_on_a_time_budget(func, maxiters, budget)[source]

budget = 60

class kwarray.distributions.DiscreteDistribution(*args, **kwargs)[source]

Bases: Distribution

class kwarray.distributions.ContinuousDistribution(*args, **kwargs)[source]

Bases: Distribution

class kwarray.distributions.MixedDistribution(*args, **kwargs)[source]

Bases: Distribution

class kwarray.distributions.Mixture(pdfs, weights=None, rng=None)[source]

Bases: MixedDistribution

Creates a mixture model of multiple distributions

Contains a set of distributions with associated weights. Sampling is done by first choosing a distribution with probability proportional to its weighthing, and then sampling from the chosen distribution.

In general, a mixture model generates data by first first we sample from z, and then we sample the observables x from a distribution which depends on z. , i.e. p(z, x) = p(z) p(x | z) [GrosseMixture] [StephensMixture].

Parameters:
  • pdfs (List[Distribution]) – list of distributions

  • weights (List[float]) – corresponding weights of each distribution

  • rng (np.random.RandomState) – seed random number generator

References

CommandLine

xdoctest -m kwarray.distributions Mixture:0 --show

Example

>>> # In this examle we create a bimodal mixture of normals
>>> from kwarray.distributions import *  # NOQA
>>> pdfs = [Normal(mean=10, std=2), Normal(18, 2)]
>>> self = Mixture(pdfs)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot(500, bins=25)
>>> kwplot.show_if_requested()
_images/fig_kwarray_distributions_Mixture_002.jpeg

Example

>>> # Compare Composed versus Mixture Distributions
>>> # Given two normal distributions,
>>> from kwarray.distributions import Normal  # NOQA
>>> from kwarray.distributions import *  # NOQA
>>> n1 = Normal(mean=11, std=3)
>>> n2 = Normal(mean=53, std=5)
>>> composed = (n1 * 0.3) + (n2 * 0.7)
>>> mixture = Mixture([n1, n2], [0.3, 0.7])
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, pnum=(2, 2, 1))
>>> ax = kwplot.figure(pnum=(2, 1, 1), title='n1 & n2').gca()
>>> n = 10000
>>> plotkw = dict(stat='density', kde=1, bins=1000)
>>> plotkw = dict(stat='count', kde=1, bins=1000)
>>> #plotkw = dict(stat='frequency', kde=1, bins='auto')
>>> n1.plot(n, ax=ax, **plotkw)
>>> n2.plot(n, ax=ax, **plotkw)
>>> ax=kwplot.figure(pnum=(2, 2, 3), title='composed').gca()
>>> composed.plot(n, ax=ax, **plotkw)
>>> ax=kwplot.figure(pnum=(2, 2, 4), title='mixture').gca()
>>> mixture.plot(n, ax=ax, **plotkw)
>>> kwplot.show_if_requested()
_images/fig_kwarray_distributions_Mixture_003.jpeg
sample(*shape)[source]

Sampling from a mixture of k distributions with weights w_k is equivalent to picking a distribution with probability w_k, and then sampling from the picked distribution. SOuser6655984 <https://stackoverflow.com/a/47762586/887074>

classmethod random(rng=None, n=3)[source]
Parameters:
  • rng (int | float | None | numpy.random.RandomState | random.Random) – random coercable

  • n (int) – number of random distributions in the mixture

Example

>>> # xdoctest: +REQUIRES(module:scipy)
>>> from kwarray.distributions import *  # NOQA
>>> print('Mixture = {!r}'.format(Mixture))
>>> print('Mixture = {!r}'.format(dir(Mixture)))
>>> self = Mixture.random(3)
>>> print('self = {!r}'.format(self))
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot('0.1s', bins=256)
>>> kwplot.show_if_requested()
_images/fig_kwarray_distributions_Mixture_random_002.jpeg
class kwarray.distributions.Composed(*args, **kwargs)[source]

Bases: MixedDistribution

A distribution generated by composing different base distributions or numbers (which are considered as constant distributions).

Given the operation and its arguments, the sampling process of a “Composed” distribution will sample from each of the operands, and then apply the operation to the sampled points. For instance if we add two Normal distributions, this will first sample from each distribution and then add the results.

Note

This is not the same as mixing distributions!

Variables:
  • self.operation (Function) – operation (add / sub / mult / div) to perform on operands

  • self.operands (Sequence[Distribution | Number]) – arguments passed to operation

Example

>>> # In this examle you can see that the sum of two Normal random
>>> # variables is also normal
>>> from kwarray.distributions import *  # NOQA
>>> operands = [Normal(mean=10, std=2), Normal(15, 2)]
>>> operation = np.add
>>> self = Composed(operation, operands)
>>> data = self.sample(5)
>>> print(ub.urepr(list(data), nl=0, precision=5))
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot(1000, bins=100)
_images/fig_kwarray_distributions_Composed_002.jpeg

Example

>>> # Binary operations result in composed distributions
>>> # We can make a (bounded) exponential distribution using a uniform
>>> from kwarray.distributions import *  # NOQA
>>> X = Uniform(.001, 7)
>>> lam = .7
>>> e = np.exp(1)
>>> self = lam * e ** (-lam * X)
>>> data = self.sample(5)
>>> print(ub.urepr(list(data), nl=0, precision=5))
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot(5000, bins=100)
_images/fig_kwarray_distributions_Composed_003.jpeg
Parameters:
  • operation (Any) – no help given. Defaults to None.

  • operands (Any) – no help given. Defaults to None.

sample(*shape)[source]
kwarray.distributions._trysample(arg, shape)[source]

samples if arg is a distribution, otherwise returns arg

exception kwarray.distributions.CoerceError[source]

Bases: ValueError

kwarray.distributions.CastError

alias of CoerceError

class kwarray.distributions.Uniform(*args, **kwargs)[source]

Bases: ContinuousDistribution

Defaults to a uniform distribution over floats between 0 and 1

Example

>>> from kwarray.distributions import *  # NOQA
>>> self = Uniform(rng=0)
>>> self.sample()
0.548813...
>>> float(self.sample(1))
0.7151...

Benchmark

>>> import ubelt as ub
>>> self = Uniform()
>>> for timer in ub.Timerit(100, bestof=10):
>>>     with timer:
>>>         [self() for _ in range(100)]
>>> for timer in ub.Timerit(100, bestof=10):
>>>     with timer:
>>>         self(100)
Parameters:
  • high (int) – no help given. Defaults to 1.

  • low (int) – no help given. Defaults to 0.

sample(*shape)[source]
classmethod coerce(arg)[source]
class kwarray.distributions.Exponential(*args, **kwargs)[source]

Bases: ContinuousDistribution

The exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate [1].

Referencs:

Example

>>> from kwarray.distributions import *  # NOQA
>>> self = Exponential(rng=0)
>>> self.sample()
>>> self.sample(2, 3)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot(500, bins=25)
_images/fig_kwarray_distributions_Exponential_002.jpeg
Parameters:

scale (int) – no help given. Defaults to 1.

sample(*shape)[source]
class kwarray.distributions.Constant(*args, **kwargs)[source]

Bases: DiscreteDistribution

Example

>>> self = Constant(42, rng=0)
>>> self.sample()
42
>>> self.sample(3)
array([42, 42, 42])
Parameters:

value (int) – constant value. Defaults to 1.

sample(*shape)[source]
class kwarray.distributions.DiscreteUniform(*args, **kwargs)[source]

Bases: DiscreteDistribution

Uniform distribution over integers.

Parameters:
  • min (int) – inclusive minimum

  • max (int) – exclusive maximum

Example

>>> self = DiscreteUniform.coerce(4)
>>> self.sample(100)
sample(*shape)[source]
classmethod coerce(arg, rng=None)[source]
class kwarray.distributions.Normal(*args, **kwargs)[source]

Bases: ContinuousDistribution

A normal distribution. See [WikiNormal] [WikiCLT].

References

Example

>>> from kwarray.distributions import *  # NOQA
>>> self = Normal(mean=100, rng=0)
>>> self.sample()
>>> self.sample(100)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot(500, bins=25)
_images/fig_kwarray_distributions_Normal_002.jpeg
Parameters:
  • mean (float) – no help given. Defaults to 0.0.

  • std (float) – no help given. Defaults to 1.0.

sample(*shape)[source]
classmethod random(rng=None)[source]
class kwarray.distributions.TruncNormal(*args, **kwargs)[source]

Bases: ContinuousDistribution

A truncated normal distribution.

A normal distribution, but bounded by low and high values. Note this is much different from just using a clipped normal.

Parameters:
  • mean (float) – mean of the distribution

  • std (float) – standard deviation of the distribution

  • low (float) – lower bound

  • high (float) – upper bound

  • rng (np.random.RandomState)

References

https://en.wikipedia.org/wiki/Truncated_normal_distribution https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html

CommandLine

xdoctest -m /home/joncrall/code/kwarray/kwarray/distributions.py TruncNormal

Example

>>> # xdoctest: +REQUIRES(module:scipy)
>>> self = TruncNormal(rng=0)
>>> self()  # output of this changes before/after scipy version 1.5
...0.1226...

Example

>>> # xdoctest: +REQUIRES(module:scipy)
>>> from kwarray.distributions import *  # NOQA
>>> low = -np.pi / 16
>>> high = np.pi / 16
>>> std = np.pi / 8
>>> self = TruncNormal(low=low, high=high, std=std, rng=0)
>>> shape = (3, 3)
>>> data = self(*shape)
>>> print(ub.urepr(data, precision=5))
np.array([[ 0.01841,  0.0817 ,  0.0388 ],
          [ 0.01692, -0.0288 ,  0.05517],
          [-0.02354,  0.15134,  0.18098]], dtype=np.float64)
_update_internals()[source]
classmethod random(rng=None)[source]
sample(*shape)[source]
class kwarray.distributions.Bernoulli(*args, **kwargs)[source]

Bases: DiscreteDistribution

The Bernoulli distribution is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q = 1 - p.

References

https://en.wikipedia.org/wiki/Bernoulli_distribution

Parameters:

p (float) – probability of success. Defaults to 0.5.

sample(*shape)[source]
classmethod coerce(arg)[source]
class kwarray.distributions.Binomial(*args, **kwargs)[source]

Bases: DiscreteDistribution

The Binomial distribution represents the discrete probabilities of obtaining some number of successes in n “binary-experiments” each with a probability of success p and a probability of failure 1 - p.

References

https://en.wikipedia.org/wiki/Binomial_distribution

Parameters:
  • p (float) – probability of success. Defaults to 0.5.

  • n (int) – probability of success. Defaults to 1.

sample(*shape)[source]
class kwarray.distributions.Categorical(categories, weights=None, rng=None)[source]

Bases: DiscreteDistribution

Example

>>> categories = [3, 5, 1]
>>> weights = [.05, .5, .45]
>>> self = Categorical(categories, weights, rng=0)
>>> self.sample()
5
>>> list(self.sample(2))
[1, 1]
>>> self.sample(2, 3)
array([[5, 5, 1],
       [5, 1, 1]])
Parameters:
  • categories (Any) – no help given. Defaults to None.

  • weights (Any) – no help given. Defaults to None.

sample(*shape)[source]
class kwarray.distributions.NonlinearUniform(min, max, nonlinearity=None, reverse=False, rng=None)[source]

Bases: ContinuousDistribution

Weighted sample between two points depending on some nonlinearity

Todo

could refactor part of this into a PowerLaw distribution

Parameters:

nonlinearity (func or str) – needs to be a function that maps the range 0-1 to the range 0-1

Example

>>> self = NonlinearUniform(0, 100, np.sqrt, rng=0)
>>> print(ub.urepr(list(self.sample(2)), precision=2, nl=0))
[74.08, 84.57]
>>> print(ub.urepr(self.sample(2, 3), precision=2, nl=1))
np.array([[77.64, 73.82, 65.09],
          [80.37, 66.15, 94.43]], dtype=np.float64)
sample(*shape)[source]
class kwarray.distributions.CategoryUniform(categories=[None], rng=None)[source]

Bases: DiscreteUniform

Discrete Uniform over a list of categories

Parameters:
  • min (int) – no help given. Defaults to 0.

  • max (int) – no help given. Defaults to 1.

sample(*shape)[source]
class kwarray.distributions.PDF(x, p, rng=None)[source]

Bases: Distribution

BROKEN?

Similar to Categorical, but interpolates to approximate a continuous random variable.

Returns a value x with probability p.

References

http://www.nehalemlabs.net/prototype/blog/2013/12/16/how-to-do-inverse-transformation-sampling-in-scipy-and-numpy/

Parameters:
  • x (list or tuple) – domain in which this PDF is defined

  • p (list) – probability sample for each domain sample

Example

>>> # xdoctest: +REQUIRES(module:scipy)
>>> from kwarray.distributions import PDF # NOQA
>>> x = np.linspace(800, 4500)
>>> p = np.log10(x)
>>> p = x ** 2
>>> self = PDF(x, p)
>>> # xdoctest: +REQUIRES(--show)
>>> import kwplot
>>> kwplot.autompl()
>>> kwplot.figure(fnum=1, doclf=True)
>>> self.plot(5000, bins=50)
_images/fig_kwarray_distributions_PDF_002.jpeg
sample(*shape)[source]
class kwarray.distributions.Seeded(rng=None, cls=None)[source]

Bases: object

Helper for grabbing pre-seeded distributions

kwarray.distributions._test_distributions()[source]
kwarray.distributions._process_docstrings()[source]

Iterate over the definitions with __params__ defined and dynamically add relevant information to their docstrings. We should modify this so it can rewrite the docstrings statically. I don’t like dynamic docstrings at runtime.

CommandLine

xdoctest -m kwarray.distributions _process_docstrings

Example

>>> # Show the results of the docstring formatting
>>> from kwarray import distributions
>>> candidates = []
>>> for val in distributions.__dict__.values():
>>>     if hasattr(val, '__params__') and val.__params__ is not NotImplemented:
>>>         candidates.append(val)
>>> for val in candidates:
>>>     print('======')
>>>     print(val)
>>>     print('-----')
>>>     print(val.__doc__)
>>>     print('======')