Common distributions
#RefreshingStatistics Day 9
Introduction
Distribution can be of 2 types:
- discrete — defined on discrete state spaces where the data can take only certain values;
- continuous — defined on continuous state spaces where the data can take any value in a predefined range.
Binomial and Bernoulli distributions
Binomial distribution is a distribution with parameters n and p representing representing number of successful trials in a sequence of n independents trials (experiments) taking a Boolean output value (yes-no, success-failure); p represents the probability of a successful outcome, whereas q = 1 — p represents probability of a failure. Notation: B(n, p); p ∈ [0, 1]
Bernoulli distribution is a special case of binomial distribution with a single trial only.
2. Uniform distribution
Continuous uniform distribution (rectangular distribution) is a family of symmetric probability distributions with defined parameters a and b — minimum and maximum values (lower and upper bounds) of distribution.
Uniform distribution (denoted as U) values interval can be:
- open U (a, b)
- close U [a, b]
Interval length is the difference between interval bounds.
Real life application: lead time investigation. Lead time is the latency between the start and the end of a process. For example, the lead time between the start of working on development task and completion can be in range 2–14 days, depending on the complexity of task, reviewing and testing process, the definition of done criteria and many other circumstances.
Discrete uniform distribution — symmetric probability distribution that represents the finite number of outcomes n having the equal probability of being observed (1 / n). Example: throwing a dice — probability of getting any value from possible sets of values S = {1, 2, 3, 4, 5, 6} is a 1 / n = 1 / 6.
3. Normal distribution
Normal (Gaussian) distribution is a type of continuous probability distribution that is uni-modal and symmetrically distributed with a bell-shaped curve.
Standard normal distribution (z-distribution) is the simplest case of normal distribution with fixed values of mean μ = 0 and standard deviation σ = 1.
Z-score is a value that represents how far the data point x lies from the mean:
Z = (x — μ) / σ.
Z-score interpretation
Z > 0: data point x is greater than the mean
Z < 0: data point x is less than the mean
Z = 0: data point x is equal to the mean