Skip to content

Chapter 1 ​

descriptive statistics = summarize observed data and present it graphically.

Types of data ​

DataDescription
Nominal DataCategories no ordering or direction
Ordinal Dataordered categories (rankings, order)
Interval DataDifferences between measurements but no true zero
Ratio DataDifferences between measurements, true zero exists

Graphs ​

Dot Diagram ​

A line of numbers on which the observations are presented as dots equal observations are stacked.

Histogram ​

frequency = the count.

  • choose a distribution in intervals: not too many nor too many few observations per interval.
  • count the number of observations in each interval, the frequency or determine the relativefrequency=frequencyn
  • Build a rectangle above each interval and choose as height either the frequency or the relative frequency.

Bar Graph ​

  • for bar graphs the variable has to be quantitative and discrete.

Measures of Center ​

  • Mean: arithmetic average x¯=1n∑i=1nxi
  • Median: the middle observation, the observations are arranged from small to large. if n is even then compute the mean of the middle observations.
  • Mode: the most frequently occurring observation.

Percentiles and quartiles ​

  • The median m is also the 50th percentile: about 50% of the observations is smaller than 50% and 50% is greater than the median m.
  • The quartiles Q1, m and Q3 are the 25th, 50th and 75th percentiles they split the observations in 4 roughly equal quarters.

Box-Plot ​

The box plot graphs the 5-number summary of the observations

  • quartiles (Q1, m, Q3)
  • smallest observation.
  • largest observation.

Measures of Variability ​

  • Range: the range r = largest - smallest observation
  • The inter-quartile range IQR=Q3−Q1
  • Variance: sample variance: s2=1n−1∑i=1n(xi−x¯)2

sample variance != population variance

  • resistant for outliers: median, IQR
  • non outlier resistant: x¯, s, x2

Chebyshev's rule: P(|X−μx|≥c)≤var(x)c2

The empirical rule ​

ony valid for bell shaped histograms

IntervalEmpirical ruleGeneral
x¯−s,x¯+s68%≤0%
x¯−2s,x¯+2s95%≤0%
x¯−3s,x¯+3s99.7%≤89%

The z-scores ​

For samples with mean x¯ and standard deviations s:

the z-score of an observation x is x−x¯s

Interpretation the distance between the value and the mean in standard deviations.

For populations with mean μ and standard deviation σ The z-score of an observation or value x is x−μσ

Empirical Rule Applied Backwards

  • 68% of observations [-1, +1] z-score
  • 95% of observations [-2, +2] z-score
  • 99.7% of observations [-3, +3] s-score

Skewness ​

normal distribution skewness = 0

| Positive Kurtosis | Symetrical Distribution | Negative Skew | |

Kurtosis ​

normal distribution kurtosis = 3

  • Negative Kurtosis
  • Normal Distribution
  • Positive Kurtosis

Sample Estimators ​

MeasurePopulation DistributionSample Estimate
Meanμ=E(X)x¯ = \frac{1}{n}\sum xi
Varianceσ2=E(X−μ)2S2=1n−1∑(xi−x¯)
Standard DeviationσS=S2
Skewnessy1=E(X−μ)3\simga3b1=1/2∑(xi−x¯)3((1/2∑(x1−x¯)2)3/2)
Kurtosisy2=E(X−μ)4σ4b2=1/2∑(xi−x¯)4(1/2∑(xi−x¯)2)2

Normality Check ​

  • Graphs: on a histogram data looks approximately normal.
  • Numerically:
    • Skewness coefficient: (close to 0)
    • Kurtosis coefficient: (close to 3)
  • Q-Q plot: no systemic deviations from the x = y line.

Exponential Distribution Check ​

  • Graph: histogram:
    • no negative values
    • peak at 0
    • skew right.
  • Numerically
    • skew (close to 2)
    • kurtosis (close to 6)
  • Q-Q plot: no systemic deviations from the x = y line.