Skip to content

ch4: Discrete random variables

Random variable

A random variable is called discrete if it can take only a finite or countably infinite number of values. The range Sx of a random variable is the set of all possible realizations (values) X(s). A random variable is discrete if the range is denumerable.

Range

The range of a variable (not neccesarily random) can be:

  • Finite
  • Countably infinite
  • Not countably infinite

The probability function of a discrete random variable

If x is a discrete random variable we'll call the function that assigns a probablilty P(X=x) to each x which is an elemant of Sx the probab lity function of X. The sum of all the probablities from this function equal 1.

Properties Probability function

  • P(X=x)0 for xSX
  • xSXP(X=x)=1

The above also means that a function which statisfies both properties is a probablity function.

The probablities $$P(X \in B)$$ for each $$B \subset S_{X}$$ are, all together, called the (probability) distribution of the random variable X. If all probablities are equal we would say that X has a homogenous distribution.

Geometric series

_k=0xk=11x

The expectation of a discrete random variable

The expectation or expected value E(X) of a discrete random variable 𝑋 is given by:

E(X)=xSXxP(X=x)

provided that this summation absolute convergent is (that is: $$\sum_{x \in S_{X}}|x| \cdot P(X=x)<\infty$$).

TIP

Expectation E(X) is the average or mean value of X

If this the summation converges (absolutely) then the expected value exists, if the summations doesn't converge (absolutely) then the expected value doesn't exist.

letterdescription
μ(greek letter m, for mean) sometimes is used instead of E(X). E(X) is often referred to as the sample or population mean
x¯standing for sample mean and μ for population mean.

Functions of a discrete random variable; variance

Building further on the expectation we can define multiple imporant properties:

Functions

If X is a discrete random variable and g a (real) function, then:

E(g(X))=xSXg(x)P(X=x)

So if Y is a linear function of X, that is Y=aX+b for any real constants a,bR then we have:

E(aX+b)=xSX(ax+b)P(X=x)=xSXaxP(X=x)+xSXbP(X=x)=axSXxP(X=x)+bxSXP(X=x)=aE(X)+b1

The average can be considered a measure for the center of a distribution X while the median is the value M such that $$P(X \leq M) \geq 50 % \text { and } P(X \geq M) \geq 50 %$$. This however tells us nothing about the magnitude of the differences in the values of X. These differences a measured in a different way:

Variance and standard deviation

NotationNameDefinition
var(X)The variance of Xvar(X)=E(XμX)2
σXThe standard deviation of Xis the square root of the variance: σX=var(X)

Properties of variance and standard deviation

  • var(X)0 and σX0
  • var(X)=E(X2)μX2 (the computational formula)
  • if var(X)>0, so if X is not degenerate, we have E(X2)>(EX)2
  • var(aX+b)=a2var(X) and σaX+b=|a|σX

Chebysshev's inequality and the Empirical rule

Formula

For any real number c>0, we have: P(|XμX|c)var(X)c2

Valid for any random variable X and gives us an upper bound of the probability of values outside the interval (μXc,μX+c)

In essence Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Using the inequality and standard deviation a standard interval can be build: (μXkσX,μX+kσX) and the upper bound of probability of observing values outside this interval is var(X)c2=var(X)k2σX2=1k2.

Empircal rule

If the graph of the distribution of 𝑋 shows a bell shape, then the approximately probabilities for 𝑋 having a value within the interval

  • (μσ,μ+σ) is 68%
  • (μ2σ,μ+2σ) is 95%
  • (μ3σ,μ+3σ) is 99.7%

Chebyshev's rule is valid for any distribution, but the so called Empirical Rule is only valid for distributions that are (approximately) symmetric and bell (hill) shaped.

The binomial, hypergeometric, geometric and Poisson distribution

The Binomial distribution

Definition

X is binomially distributed with parameters n and p, for all $n = 1, 2, … $ and p[0,1], if the probability function of X is given by: P(X=k)=(nk))pk(1p)nk, where k=0,1,2,,n

Short notations: X is B(n,p)-distributed, or: XB(n,p)

One can apply the binomial distribution as a probability model of real life situations, whenever there is a series of 𝑛 similar experiments for which the conditions of Bernoulli trials hold, i.e.:

  • A phenomenon occurs (or does not occur) at a fixed success rate p (or failure rate 1p)
  • Independence of the trials.

If X is B(n,p)-distributed, then expected value and variance are given by: E(X)=np and var(X)=np(1p)

Special values of n and p, the parameters of the B(n,p)-distribution

  • If p=1 ("success guaranteed"), then P(X=n)=1 and E(X)=n:X has a degenerate distribution in n. Similarly, if p=0, then P(X=0)=1 and E(X)=0.
  • If n=1, that is, if only one trial is conducted (one shot on the basket, the quality of one product is assessed, etc.), X is said to have an alternative distribution with success probability p, which is a B(1,p) -distribution.

It follows that: P(X=1)=p, and, P(X=0)=1p

so:

E(X)=xxP(X=x)=1p+0(1p)=p

And:

E(X2)=xx2P(X=x)=12p+02(1p)=p

We find:

var(X)=E(X2)(EX)2=p(1p),

the variance of a B(1,p) -distribution.

The Hypergeometric distribution

Definition

X is hypergeometricly distributed (with parameters N, R and n) if:

P(Ak)=(Rk)(NRnk)(Nn)

If the probability function of the random variable X can be given by the hypergeometric formula, X is said to have a hypergeometric distribution. We can apply this distribution whenever we consider a number of random draws without replacement from a so called dichotomous population: consisting of elements which do or do not have a specific property.

Random draws from a dichotomous population lead to the hypergeometric distribution of the number of “successes” if we draw without replacement, but on the other hand, if the draws are with replacement, we can use the binomial distribution: in that case the draws should be independent.

Other properties

For relatively large R and NR and relatively small n the hypergeometric distribution with parameters N, R and n can be approximated by a $$B\left(n, \frac{R}{N}\right)-\text { distribution }$$.

Note that the variances of the hypergeometric and binomial distributions under these conditions are almost equal: np(1p)NnN1np(1p).

A (quite strict) rule of thumb for approximating by the binomial distribution is N>5n2.

The Geometric distribution

Definition

X has a geometric distribution with parameter p(0,1], if:

P(X=k)=(1p)k1p, where k=1,2,

If p=1 the distribution is degenerate: P(X=1)=1. Using the properties of the geometric series (which you can find in the appendix “Mathematical Techniques” of Probability Theory for Engineers, see canvas) the following can be proven:

E(X)=1p en var(X)=1pp2

The following formula is convenient whenever we have to compute a summation of geometric probabilities:

P(X>k)=(1p)k

The reasoning is as follows: the probability that we need more than k trials to score a success equals the probability that we are not successful in the first k trials.

The Poisson distribution

Definition

X has a Poisson distribution with parameter μ>0 if

P(X=k)=μkeμk!, for k=0,1,2

This is a probability function: all probabilities are at least 0 and the sum of all probabilities is 1.

Poisson probabilities are given in (cumulative) probability tables for $$ P(X \leq c)$$

Other properties

If X has a B(n,p)-distribution with “large n and small p”, then X has approximately a Poisson distribution with parameter μ=np.

A rule of thumb for applying this approximation is:

n>25 and np<10 or n(1p)<10

These approximations are also applicable in case of "large n and large p "(p close to 1) ", because we noticed before that if the number of successes X is B(n,p) with p close to 1, then the number of failures, nX, is B(n,1p), with 1p close to 0.