kwarray.distributions module¶
Defines data structures for efficient repeated sampling of specific distributions (e.g. Normal, Uniform, Binomial) with specific parameters.
Inspired by ~/code/imgaug/imgaug/parameters.py
- Similar Libraries:
Todo
[ ] change sample shape to just a single num.
[ ] Some Distributions will output vectors. Maybe we could just postpend the dimensions?
[ ] Expose as kwstats?
[ ] Improve coerce syntax for concise distribution specification
References
https://stackoverflow.com/questions/21100716/fast-arbitrary-distribution-random-sampling https://stackoverflow.com/questions/4265988/generate-random-numbers-with-a-given-numerical-distribution https://codereview.stackexchange.com/questions/196286/inverse-transform-sampling
- class kwarray.distributions.Value(default=None, min=None, max=None, help=None, constraints=None, type=None, name=None)[source]¶
Bases:
NiceRepr
Container for class __params__ values.
Used to store metadata about distribution arguments, including default values, numeric constraints, typing, and help text.
Example
>>> from kwarray.distributions import * # NOQA >>> self = Value(43.5) >>> print(Value(name='lucy')) >>> print(Value(name='jeff', default=1)) >>> self = Value(name='fred', default=1.0) >>> print('self = {}'.format(ub.urepr(self, nl=1))) >>> print(Value(name='bob', default=1.0, min=-5, max=5)) >>> print(Value(name='alice', default=1.0, min=-5))
- kwarray.distributions._issubclass2(child, parent)[source]¶
Uses string comparisons to avoid ipython reload errors. Much less robust though.
- kwarray.distributions._isinstance2(obj, cls)[source]¶
Internal hacked version is isinstance for debugging
obj = self cls = distributions.Distribution
child = obj.__class__ parent = cls
- class kwarray.distributions.Parameterized[source]¶
Bases:
NiceRepr
Keeps track of all registered params and classes with registered params
- idstr(nl=None, thresh=80)[source]¶
Example
>>> # xdoctest: +REQUIRES(module:scipy) >>> self = TruncNormal() >>> self.idstr() >>> # >>> # >>> class Dummy(Distribution): >>> def __init__(self): >>> super(Dummy, self).__init__() >>> self._setparam('a', 3) >>> self.b = Normal() >>> self.c = Uniform() >>> self = Dummy() >>> print(self.idstr()) >>> # >>> class Tail5(Distribution): >>> def __init__(self): >>> super(Tail5, self).__init__() >>> self._setparam('a_parameter', 3) >>> for i in range(5): >>> self._setparam(chr(i + 97), i) >>> # >>> class Tail6(Distribution): >>> def __init__(self): >>> super(Tail6, self).__init__() >>> for i in range(9): >>> self._setparam(chr(i + 97) + '_parameter', i) >>> # >>> class Dummy2(Distribution): >>> def __init__(self): >>> super(Dummy2, self).__init__() >>> self._setparam('x', 3) >>> self._setparam('y', 3) >>> self.d = Dummy() >>> self.f = Tail6() >>> self.y = Tail5() >>> self = Dummy2() >>> print(self.idstr()) >>> print(ub.urepr(self.json_id()))
- class kwarray.distributions.ParameterizedList(items)[source]¶
Bases:
Parameterized
Example
>>> from kwarray import distributions as stoch >>> self1 = stoch.ParameterizedList([ >>> stoch.Normal(), >>> stoch.Uniform(), >>> ]) >>> print(self1.idstr()) >>> self = stoch.ParameterizedList([stoch.ParameterizedList([ >>> stoch.Normal(), >>> stoch.Uniform(), >>> self1, >>> ])]) >>> print(self.idstr()) >>> print(self.idstr(0))
- class kwarray.distributions._BinOpMixin[source]¶
Bases:
object
Allows binary operations to be performed on distributions to create composed distributions.
- class kwarray.distributions._RBinOpMixin[source]¶
Bases:
_BinOpMixin
- class kwarray.distributions.Distribution(*args, **kwargs)[source]¶
Bases:
Parameterized
,_RBinOpMixin
Base class for all distributions.
- There are 3 main subtypes:
ContinuousDistribution DiscreteDistribution MixedDistribution
Note
In [DiscVsCont] notes that there are only 3 types of random variables: discrete, continuous, or mixed. And these types are mutually exclusive.
Note
When inheriting from this class, you typically do not need to define an __init__ method. Instead, overwrite the __params__ class attribute with an OrderedDict[str, Value] to indicate what the signature of the __init__ method should be. This allows for (1) concise expression of new distributions and (2) for new distributions to inherit a random classmethod that works according to constraints specified in each parameter Value.
If you do overwrite __init__, be sure to call super().
References
- classmethod random(rng=None)[source]¶
Returns a random distribution
- Parameters:
rng (int | float | None | numpy.random.RandomState | random.Random) – random coercable
CommandLine
xdoctest -m /home/joncrall/code/kwarray/kwarray/distributions.py Distribution.random --show
Example
>>> # xdoctest: +REQUIRES(module:scipy) >>> from kwarray.distributions import * # NOQA >>> self = Distribution.random() >>> print('self = {!r}'.format(self)) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot('0.001s', bins=256) >>> kwplot.show_if_requested()
- plot(n='0.01s', bins='auto', stat='count', color=None, kde=True, ax=None, **kwargs)[source]¶
Plots
n
samples from the distribution.- Parameters:
bins (int | List[Number] | str) – number of bins, bin edges, or special numpy method for finding the number of bins.
stat (str) – density, count, probability, frequency
**kwargs – other args passed to
seaborn.histplot()
Example
>>> from kwarray.distributions import Normal # NOQA >>> self = Normal() >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> self.plot(n=1000)
- class kwarray.distributions.DiscreteDistribution(*args, **kwargs)[source]¶
Bases:
Distribution
- class kwarray.distributions.ContinuousDistribution(*args, **kwargs)[source]¶
Bases:
Distribution
- class kwarray.distributions.MixedDistribution(*args, **kwargs)[source]¶
Bases:
Distribution
- class kwarray.distributions.Mixture(pdfs, weights=None, rng=None)[source]¶
Bases:
MixedDistribution
Creates a mixture model of multiple distributions
Contains a set of distributions with associated weights. Sampling is done by first choosing a distribution with probability proportional to its weighthing, and then sampling from the chosen distribution.
In general, a mixture model generates data by first first we sample from z, and then we sample the observables x from a distribution which depends on z. , i.e. p(z, x) = p(z) p(x | z) [GrosseMixture] [StephensMixture].
- Parameters:
pdfs (List[Distribution]) – list of distributions
weights (List[float]) – corresponding weights of each distribution
rng (np.random.RandomState) – seed random number generator
References
CommandLine
xdoctest -m kwarray.distributions Mixture:0 --show
Example
>>> # In this examle we create a bimodal mixture of normals >>> from kwarray.distributions import * # NOQA >>> pdfs = [Normal(mean=10, std=2), Normal(18, 2)] >>> self = Mixture(pdfs) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot(500, bins=25) >>> kwplot.show_if_requested()
Example
>>> # Compare Composed versus Mixture Distributions >>> # Given two normal distributions, >>> from kwarray.distributions import Normal # NOQA >>> from kwarray.distributions import * # NOQA >>> n1 = Normal(mean=11, std=3) >>> n2 = Normal(mean=53, std=5) >>> composed = (n1 * 0.3) + (n2 * 0.7) >>> mixture = Mixture([n1, n2], [0.3, 0.7]) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, pnum=(2, 2, 1)) >>> ax = kwplot.figure(pnum=(2, 1, 1), title='n1 & n2').gca() >>> n = 10000 >>> plotkw = dict(stat='density', kde=1, bins=1000) >>> plotkw = dict(stat='count', kde=1, bins=1000) >>> #plotkw = dict(stat='frequency', kde=1, bins='auto') >>> n1.plot(n, ax=ax, **plotkw) >>> n2.plot(n, ax=ax, **plotkw) >>> ax=kwplot.figure(pnum=(2, 2, 3), title='composed').gca() >>> composed.plot(n, ax=ax, **plotkw) >>> ax=kwplot.figure(pnum=(2, 2, 4), title='mixture').gca() >>> mixture.plot(n, ax=ax, **plotkw) >>> kwplot.show_if_requested()
- sample(*shape)[source]¶
Sampling from a mixture of k distributions with weights w_k is equivalent to picking a distribution with probability w_k, and then sampling from the picked distribution. SOuser6655984 <https://stackoverflow.com/a/47762586/887074>
- classmethod random(rng=None, n=3)[source]¶
- Parameters:
rng (int | float | None | numpy.random.RandomState | random.Random) – random coercable
n (int) – number of random distributions in the mixture
Example
>>> # xdoctest: +REQUIRES(module:scipy) >>> from kwarray.distributions import * # NOQA >>> print('Mixture = {!r}'.format(Mixture)) >>> print('Mixture = {!r}'.format(dir(Mixture))) >>> self = Mixture.random(3) >>> print('self = {!r}'.format(self)) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot('0.1s', bins=256) >>> kwplot.show_if_requested()
- class kwarray.distributions.Composed(*args, **kwargs)[source]¶
Bases:
MixedDistribution
A distribution generated by composing different base distributions or numbers (which are considered as constant distributions).
Given the operation and its arguments, the sampling process of a “Composed” distribution will sample from each of the operands, and then apply the operation to the sampled points. For instance if we add two Normal distributions, this will first sample from each distribution and then add the results.
Note
This is not the same as mixing distributions!
- Variables:
self.operation (Function) – operation (add / sub / mult / div) to perform on operands
self.operands (Sequence[Distribution | Number]) – arguments passed to operation
Example
>>> # In this examle you can see that the sum of two Normal random >>> # variables is also normal >>> from kwarray.distributions import * # NOQA >>> operands = [Normal(mean=10, std=2), Normal(15, 2)] >>> operation = np.add >>> self = Composed(operation, operands) >>> data = self.sample(5) >>> print(ub.urepr(list(data), nl=0, precision=5)) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot(1000, bins=100)
Example
>>> # Binary operations result in composed distributions >>> # We can make a (bounded) exponential distribution using a uniform >>> from kwarray.distributions import * # NOQA >>> X = Uniform(.001, 7) >>> lam = .7 >>> e = np.exp(1) >>> self = lam * e ** (-lam * X) >>> data = self.sample(5) >>> print(ub.urepr(list(data), nl=0, precision=5)) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot(5000, bins=100)
- Parameters:
operation (Any) – no help given. Defaults to None.
operands (Any) – no help given. Defaults to None.
- kwarray.distributions._trysample(arg, shape)[source]¶
samples if arg is a distribution, otherwise returns arg
- exception kwarray.distributions.CoerceError[source]¶
Bases:
ValueError
- kwarray.distributions.CastError¶
alias of
CoerceError
- class kwarray.distributions.Uniform(*args, **kwargs)[source]¶
Bases:
ContinuousDistribution
Defaults to a uniform distribution over floats between 0 and 1
Example
>>> from kwarray.distributions import * # NOQA >>> self = Uniform(rng=0) >>> self.sample() 0.548813... >>> float(self.sample(1)) 0.7151...
Benchmark
>>> import ubelt as ub >>> self = Uniform() >>> for timer in ub.Timerit(100, bestof=10): >>> with timer: >>> [self() for _ in range(100)] >>> for timer in ub.Timerit(100, bestof=10): >>> with timer: >>> self(100)
- Parameters:
high (int) – no help given. Defaults to 1.
low (int) – no help given. Defaults to 0.
- class kwarray.distributions.Exponential(*args, **kwargs)[source]¶
Bases:
ContinuousDistribution
The exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate [1].
- Referencs:
Example
>>> from kwarray.distributions import * # NOQA >>> self = Exponential(rng=0) >>> self.sample() >>> self.sample(2, 3) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot(500, bins=25)
- Parameters:
scale (int) – no help given. Defaults to 1.
- class kwarray.distributions.Constant(*args, **kwargs)[source]¶
Bases:
DiscreteDistribution
Example
>>> self = Constant(42, rng=0) >>> self.sample() 42 >>> self.sample(3) array([42, 42, 42])
- Parameters:
value (int) – constant value. Defaults to 1.
- class kwarray.distributions.DiscreteUniform(*args, **kwargs)[source]¶
Bases:
DiscreteDistribution
Uniform distribution over integers.
- Parameters:
min (int) – inclusive minimum
max (int) – exclusive maximum
Example
>>> self = DiscreteUniform.coerce(4) >>> self.sample(100)
- class kwarray.distributions.Normal(*args, **kwargs)[source]¶
Bases:
ContinuousDistribution
A normal distribution. See [WikiNormal] [WikiCLT].
References
Example
>>> from kwarray.distributions import * # NOQA >>> self = Normal(mean=100, rng=0) >>> self.sample() >>> self.sample(100) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot(500, bins=25)
- Parameters:
mean (float) – no help given. Defaults to 0.0.
std (float) – no help given. Defaults to 1.0.
- class kwarray.distributions.TruncNormal(*args, **kwargs)[source]¶
Bases:
ContinuousDistribution
A truncated normal distribution.
A normal distribution, but bounded by low and high values. Note this is much different from just using a clipped normal.
- Parameters:
mean (float) – mean of the distribution
std (float) – standard deviation of the distribution
low (float) – lower bound
high (float) – upper bound
rng (np.random.RandomState)
References
https://en.wikipedia.org/wiki/Truncated_normal_distribution https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html
CommandLine
xdoctest -m /home/joncrall/code/kwarray/kwarray/distributions.py TruncNormal
Example
>>> # xdoctest: +REQUIRES(module:scipy) >>> self = TruncNormal(rng=0) >>> self() # output of this changes before/after scipy version 1.5 ...0.1226...
Example
>>> # xdoctest: +REQUIRES(module:scipy) >>> from kwarray.distributions import * # NOQA >>> low = -np.pi / 16 >>> high = np.pi / 16 >>> std = np.pi / 8 >>> self = TruncNormal(low=low, high=high, std=std, rng=0) >>> shape = (3, 3) >>> data = self(*shape) >>> print(ub.urepr(data, precision=5)) np.array([[ 0.01841, 0.0817 , 0.0388 ], [ 0.01692, -0.0288 , 0.05517], [-0.02354, 0.15134, 0.18098]], dtype=np.float64)
- class kwarray.distributions.Bernoulli(*args, **kwargs)[source]¶
Bases:
DiscreteDistribution
The Bernoulli distribution is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q = 1 - p.
References
https://en.wikipedia.org/wiki/Bernoulli_distribution
- Parameters:
p (float) – probability of success. Defaults to 0.5.
- class kwarray.distributions.Binomial(*args, **kwargs)[source]¶
Bases:
DiscreteDistribution
The Binomial distribution represents the discrete probabilities of obtaining some number of successes in n “binary-experiments” each with a probability of success p and a probability of failure 1 - p.
References
https://en.wikipedia.org/wiki/Binomial_distribution
- Parameters:
p (float) – probability of success. Defaults to 0.5.
n (int) – probability of success. Defaults to 1.
- class kwarray.distributions.Categorical(categories, weights=None, rng=None)[source]¶
Bases:
DiscreteDistribution
Example
>>> categories = [3, 5, 1] >>> weights = [.05, .5, .45] >>> self = Categorical(categories, weights, rng=0) >>> self.sample() 5 >>> list(self.sample(2)) [1, 1] >>> self.sample(2, 3) array([[5, 5, 1], [5, 1, 1]])
- Parameters:
categories (Any) – no help given. Defaults to None.
weights (Any) – no help given. Defaults to None.
- class kwarray.distributions.NonlinearUniform(min, max, nonlinearity=None, reverse=False, rng=None)[source]¶
Bases:
ContinuousDistribution
Weighted sample between two points depending on some nonlinearity
Todo
could refactor part of this into a PowerLaw distribution
- Parameters:
nonlinearity (func or str) – needs to be a function that maps the range 0-1 to the range 0-1
Example
>>> self = NonlinearUniform(0, 100, np.sqrt, rng=0) >>> print(ub.urepr(list(self.sample(2)), precision=2, nl=0)) [74.08, 84.57] >>> print(ub.urepr(self.sample(2, 3), precision=2, nl=1)) np.array([[77.64, 73.82, 65.09], [80.37, 66.15, 94.43]], dtype=np.float64)
- class kwarray.distributions.CategoryUniform(categories=[None], rng=None)[source]¶
Bases:
DiscreteUniform
Discrete Uniform over a list of categories
- Parameters:
min (int) – no help given. Defaults to 0.
max (int) – no help given. Defaults to 1.
- class kwarray.distributions.PDF(x, p, rng=None)[source]¶
Bases:
Distribution
BROKEN?
Similar to Categorical, but interpolates to approximate a continuous random variable.
Returns a value x with probability p.
References
- Parameters:
x (list or tuple) – domain in which this PDF is defined
p (list) – probability sample for each domain sample
Example
>>> # xdoctest: +REQUIRES(module:scipy) >>> from kwarray.distributions import PDF # NOQA >>> x = np.linspace(800, 4500) >>> p = np.log10(x) >>> p = x ** 2 >>> self = PDF(x, p) >>> # xdoctest: +REQUIRES(--show) >>> import kwplot >>> kwplot.autompl() >>> kwplot.figure(fnum=1, doclf=True) >>> self.plot(5000, bins=50)
- class kwarray.distributions.Seeded(rng=None, cls=None)[source]¶
Bases:
object
Helper for grabbing pre-seeded distributions
- kwarray.distributions._process_docstrings()[source]¶
Iterate over the definitions with __params__ defined and dynamically add relevant information to their docstrings. We should modify this so it can rewrite the docstrings statically. I don’t like dynamic docstrings at runtime.
CommandLine
xdoctest -m kwarray.distributions _process_docstrings
Example
>>> # Show the results of the docstring formatting >>> from kwarray import distributions >>> candidates = [] >>> for val in distributions.__dict__.values(): >>> if hasattr(val, '__params__') and val.__params__ is not NotImplemented: >>> candidates.append(val) >>> for val in candidates: >>> print('======') >>> print(val) >>> print('-----') >>> print(val.__doc__) >>> print('======')