Median
A number that separates ordered data into halves. Half the values are the same number or
smaller than the median and half the values are the same number or larger than the median.
The median may or may not be part of the data.
Mode
The value that appears most frequently in a set of data.
Mutually Exclusive
An observation cannot fall into more than one class (category). Being in more than one category
prevents being in a mutually exclusive category.
N Normal Distribution
A continuous random variable (RV) with pdf f(x) =
1
√
e−(x− µ)2/2 2
σ , where µ is the mean of
σ
2 π
the distribution and σ is the standard deviation. Notation: X ∼ N ( µ, σ). If µ = 0 and σ = 1, the RV is called the standard normal distribution.
O Outcome (observation)
A particular result of an experiment.
Outlier
An observation that does not fit the rest of the data.
P p-value
The probability that an event will happen purely by chance assuming the null hypothesis is true.
The smaller the p-value, the stronger the evidence is against the null hypothesis.
Parameter
A numerical characteristic of the population.
Percentile
A number that divides ordered data into hundredths.
590
GLOSSARY
Example: Let a data set contain 200 ordered observations starting with {2.3, 2.7, 2.8, 2.9, 2.9, 3.0...}.
Then the first percentile is (2.7+2.8) = 2.75, because 1% of the data is to the left of this point on
2
the number line and 99% of the data is on its right. The second percentile is (2.9+2.9) = 2.9.
2
Percentiles may or may not be part of the data. In this example, the first percentile is not in the
data, but the second percentile is. The median of the data is the second quartile and the 50th
percentile. The first and third quartiles are the 25th and the 75th percentiles, respectively.
Point Estimate
A single number computed from a sample and used to estimate a population parameter.
Population
The collection, or set, of all individuals, objects, or measurements whose properties are being
studied.
Probability
A number between 0 and 1, inclusive, that gives the likelihood that a specific event will occur.
The foundation of statistics is given by the following 3 axioms (by A. N. Kolmogorov, 1930’s):
Let S denote the sample space and A and B are two events in S . Then:
• 0 ≤ P (A) ≤ 1;.
• If A and B are any two mutually exclusive events, then P (AorB) = P (A) + P (B).
• P (S) = 1.
Probability Distribution Function (PDF)
A mathematical description of a discrete random variable (RV), given either in the form of an
equation (formula) , or in the form of a table listing all the possible outcomes of an experiment
and the probability associated with each outcome.
Example: A biased coin with probability 0.7 for a head (in one toss of the coin) is tossed 5 times.
We are interested in the number of heads (the RV X = the number of heads). X is Binomial, so
5
X ∼ B (5, 0.7) and P (X = x) =
.7x .35−x or in the form of the table:
x
x
P (X = x)
0
0.0024
1
0.0284
2
0.1323
3
0.3087
4
0.3602
5
0.1681
Table 5.3
Proportion
• As a number: A proportion is the number of successes divided by the total number in the
sample.
• As a probability distribution: Given a binomial random variable (RV), X ∼B (n, p), consider
the ratio of the number X of successes in n Bernouli trials to the number n of trials. P’ = X .
n
This new RV is called a proportion, and if the number of trials, n, is large enough, P’
∼N p, pq .
n
GLOSSARY
591
Q Qualitative Data
See Data.
Quantitative Data
Quartiles
The numbers that separate the data into quarters. Quartiles may or may not be part of the data.
The second quartile is the median of the data.
R Random Variable (RV)
see Variable
Relative Frequency
The ratio of the number of times a value of the data occurs in the set of all outcomes to the
number of all outcomes.
S Sample
A portion of the population understudy. A sample is representative if it characterizes the
population being studied.
Sample Space
The set of all possible outcomes of an experiment.
Standard Deviation
A number that is equal to the square root of the variance and measures how far data values are
from their mean. Notation: s for sample standard deviation and σ for population standard
deviation.
Standard Error of the Mean
The standard deviation of the distribution of the sample means, σ
√ .
n
Standard Normal Distribution
A continuous random variable (RV) X~N (0, 1) .. When X follows the standard normal
distribution, it is often noted as Z~N (0, 1).
Statistic
A numerical characteristic of the sample. A statistic estimates the corresponding population
parameter. For example, the average number of full-time students in a 7:30 a.m. class for this
term (statistic) is an estimate for the average number of full-time students in any class this term
(parameter).
Student’s-t Distribution
Investigated and reported by William S. Gossett in 1908 and published under the pseudonym
Student. The major characteristics of the random variable (RV) are:
• It is continuous and assumes any real values.
• The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at
the apex than the normal distribution.
• It approaches the standard normal distribution as n gets larger.
• There is a "family" of t distributions: every representative of the family is completely
defined by the number of degrees of freedom which is one less than the number of data.
Student-t Distribution
592
GLOSSARY
T Tree Diagram
The useful visual representation of a sample space and events in the form of a “tree” with
branches marked by possible outcomes simultaneously with associated probabilities
(frequencies, relative frequencies).
Type 1 Error
The decision is to reject the Null hypothesis when, in fact, the Null hypothesis is true.
Type 2 Error
The decision is to not reject the Null hypothesis when, in fact, the Null hypothesis is false.
U Uniform Distribution
A continuous random variable (RV) that has equally likely outcomes over the domain,
a < x < b. Often referred as the Rectangular distribution because the graph of the pdf has the
form of a rectangle. Notation: X~U (a, b). The mean is µ = a+b and the standard deviation is
2
σ =
(b−a)2 The probability density function is f (x) = 1 for a < x < b or a ≤ x ≤ b. The
12
b−a
cumulative distribution is P (X ≤ x) = x−a .
b−a
V Variable (Random Variable)
A characteristic of interest in a population being studied. Common notation for variables are
upper case Latin letters X, Y, Z,...; common notation for a specific value from the domain (set of
all possible values of a variable) are lower case Latin letters x, y, z,.... For example, if X is the
number of children in a family, then x represents a specific integer 0, 1, 2, 3, .... Variables in
statistics differ from variables in intermediate algebra in two following ways.
• The domain of the random variable (RV) is not necessarily a numerical set; the domain may
be expressed in words; for example, if X = hair color then the domain is {black, blond, gray,
green, orange}.
• We can tell what specific value x of the Random Variable X takes only after performing the
experiment.
Variance
Mean of the squared deviations from the mean. Square of the standard deviation. For a set of
data, a deviation can be represented as x − x where x is a value of the data and x is the sample
mean. The sample variance is equal to the sum of the squares of the deviations divided by the
difference of the sample size and 1.
Venn Diagram
The visual representation of a sample space and events in the form of circles or ovals showing
their intersections.
Z z-score
The linear transformation of the form z = x− µ . If this transformation is applied to any normal
σ
distribution X~N ( µ, σ) , the result is the standard normal distribution Z~N (0, 1). If this
transformation is applied to any specific value x of the RV with mean µ and standard deviation
σ , the result is called the z-score of x. Z-scores allow us to compare data that are normally
distributed but scaled differently.
INDEX
593
Index of Keywords and Terms
Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords
do not necessarily appear in the text of the page. They are merely associated with that section. Ex.
apples, § 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1
"
"hypothesis testing.", 367
conditional probability, 165
confidence interval, 318, 326
A A AND B, § 4.2(164)
confidence intervals, 329, 367
A OR B, § 4.2(164)
confidence level, 319, 329
accessibility, § (5)
contingency, § 4.5(173), § 4.9(183)
addition, § 4.4(169)
contingency table, 173, 471
additional, § (5)
continuity correction factor, 289
adoption, § (5)
Continuous, § 1.5(17), 17, § 1.9(25), § 1.11(29),
alternate hypothesis, § 12.2(506), § 12.3(506)
§ 13.4.2(550)
ANOVA, § 12.1(505), § 12.2(506), 506,
convenience, § 1.9(25)
§ 12.3(506), § 12.4(508), § 13.5.4(562)
Counting, § 1.5(17)
answer, § 1.7(20)
critical value, 257
appendix, § 13.3(545)
cumulative, § 1.8(21), § 1.9(25), § 1.11(29),
article, § 13.4.3(553)
§ 1.12(37)
average, § 1.4(15), § 5.3(211)
Cumulative relative frequency, 21
B
curve, § 12.4(508)
bar, § 2.4(49)
Bernoulli, § 5.5(214), § 5.9(223)
D Data, § 1.1(13), § 1.2(13), 13, § 1.4(15), 16,
Bernoulli Trial, 215
§ 1.5(17), § 1.6(18), § 1.9(25), § 1.10(26),
binomial, § 5.4(214), § 5.5(214), § 5.6(218),
§ 1.11(29), § 1.12(37), § 2.1(45), § 2.2(45),
§ 5.9(223)
§ 2.4(49), § 13.3(545), § 13.4.1(548),
binomial distribution, 371
§ 13.4.5(556), § 13.5.2(560)
binomial probability distribution, 215
degrees of freedom, 326, § 12.3(506),
bivariate, § 13.4.5(556)
§ 12.4(508)
box, § 2.10(72), § 2.12(76)
degrees of freedom (df), 421
boxes, § 2.4(49)
descriptive, § 1.2(13), § 2.2(45), § 2.3(46),
C
§ 2.5(53), § 2.10(72), § 2.12(76)
cards, § 5.11(236)
deviation, § 2.10(72), § 2.12(76)
categorical, § 1.4(15)
diagram, § 4.6(176), § 4.7(177)
center, § 2.5(53)
dice, § 5.12(240)
central, § 13.4.2(550)
Discrete, § 1.5(17), 17, § 1.9(25), § 1.11(29),
Central Limit Theorem, § 7.2(280), § 7.3(283),
§ 5.1(209), § 5.2(210), § 5.3(211), § 5.4(214),
§ 7.10(309), 377
§ 5.5(214), § 5.6(218), § 5.7(220), § 5.9(223),
chance, § 4.2(164), § 4.3(166)
§ 5.10(233), § 5.11(236), § 5.12(240)
chi, § 11.4(464), § 11.5(471)
display, § 2.2(45)
chi-square, § 13.5.3(561), § 14(585)
distribution, § 5.1(209), § 5.2(210), § 5.5(214),
CLT, 284
§ 5.7(220), § 5.10(233), § 5.11(236), § 5.12(240),
cluster, § 1.9(25), § 1.13(39)
§ 11.5(471), § 13.4.2(550), § 13.5.3(561)
collaborative, § (1), § (5)
distribution is binomial, 329
collection, 1
dot plot, § 1.2(13)
condition, § 4.3(166)
conditional, § 4.2(164), § 4.11(186)
E elementary, § (1), § (5), § (9), § 2.1(45),
594
INDEX
§ 2.2(45), § 2.3(46), § 2.5(53), § 2.6(58), § 2.7(61),
F f, § 14(585)
§ 2.8(63), § 2.9(71), § 2.10(72), § 2.11(75),
F Distribution, § 12.1(505), § 12.2(506),
§ 2.13(93), § 3.1(103), § 3.2(103), § 3.3(105),
§ 12.3(506), § 12.4(508), § 13.5.4(562)
§ 3.4(106), § 3.5(108), § 3.6(114), § 3.7(116),
F Ratio, § 12.3(506)
§ 3.8(121), § 3.9(121), § 3.10(128), § 3.11(131),
fit, § 11.4(464)
§ 3.12(132), § 3.13(135), § 3.14(150), § 3.15(153),
formula, § 4.8(182), § 5.6(218)
§ 3.16(155), § 4.4(169), § 4.5(173), § 4.6(176),
frequency, § 1.8(21), 21, § 1.9(25), § 1.10(26),
§ 4.7(177), § 4.8(182), § 4.9(183), § 4.10(185),
§ 1.11(29), § 1.12(37), § 1.13(39), 49, § 2.10(72),
§ 4.11(186), § 4.12(197), § 4.13(199), § 5.1(209),
§ 2.12(76), § 4.2(164)
§ 5.2(210), § 5.3(211), § 5.4(214), § 5.5(214),
function, § 5.2(210), § 5.4(214), § 5.5(214),
§ 5.6(218), § 5.7(220), § 5.9(223), § 5.10(233),
§ 5.6(218), § 5.10(233)
§ 5.11(236), § 5.12(240), § 6.1(251), § 6.2(252),
§ 6.3(253), § 6.4(255), § 6.5(255), § 6.6(259),
G geometric, § 5.4(214), § 5.6(218), § 5.9(223)
§ 6.7(260), § 6.8(262), § 6.9(268), § 6.10(270),
good, § 11.4(464)
§ 6.11(273), § 7.1(279), § 7.4(284), § 7.5(292),
graph, § 2.2(45), § 2.3(46), § 13.4.1(548)
§ 7.6(293), § 7.7(296), § 7.8(303), § 7.9(305),
guide, § (5)
§ 8.1(317), § 8.2(319), § 8.3(326), § 8.4(329),
H histogram, § 2.4(49), § 2.10(72), § 2.12(76)
§ 8.5(334), § 8.6(335), § 8.7(337), § 8.8(339),
Homework, § 1.11(29), § 2.10(72), § 2.12(76),
§ 8.9(341), § 8.10(351), § 8.11(354), § 8.12(357),
§ 4.9(183), § 4.12(197), § 4.13(199), § 5.9(223),
§ 8.13(359), § 9.1(367), § 9.2(368), § 9.3(369),
§ 5.10(233), § 5.11(236), § 5.12(240)
§ 9.4(370), § 9.5(371), § 9.6(371), § 9.7(372),
hypergeometric, § 5.4(214), § 5.9(223)
§ 9.8(373), § 9.9(373), § 9.10(375), § 9.11(375),
hypergeometrical, § 5.6(218)
§ 9.12(386), § 9.13(387), § 9.14(389), § 9.15(391),
hypotheses, 368
§ 9.16(393), § 9.17(406), § 9.18(409), § 10.1(419),
hypothesis, § 13.4.3(553), § 13.4.4(555),
§ 10.2(420), § 10.3(423), § 10.4(425), § 10.5(427),
§ 13.5.1(559), § 13.5.2(560)
§ 10.6(432), § 10.7(433), § 10.8(435), § 10.9(437),
hypothesis test, 371, 373, 375, § 12.2(506),
§ 10.10(449), § 10.11(451), § 11.1(461),
§ 12.3(506)
§ 11.2(462), § 11.3(462), § 11.4(464), § 11.5(471),
§ 11.6(476), § 11.7(477), § 11.8(479), § 11.9(481),
I
independence, § 11.5(471)
§ 11.10(489), § 11.11(493), § 11.12(498),
independent, § 4.3(166), 166, 170, § 4.10(185),
§ 12.5(513), § 12.6(514), § 12.7(516), § 12.8(518),
§ 4.11(186)
§ 12.9(522), § 13.1(527), § 13.2(536), § 13.3(545),
inferential, § 1.2(13)
§ 13.4.1(548), § 13.4.2(550), § 13.4.3(553),
inferential statistics, 317
§ 13.4.4(555), § 13.4.5(556), § 13.5.1(559),
interquartile range, 54
§ 13.5.2(560), § 13.5.3(561), § 13.5.4(562),
Introduction, § 1.1(13), § 4.1(163), § 5.1(209)
§ 13.6(563), § 13.7(564), § 13.8(569)
IQR, § 2.5(53)
elementary statistics, § (11), § 2.12(76)
equally likely, § 4.2(164), 164
K key terms, § 4.2(164)
error bound for a population mean, 319, 327
L lab, § 1.13(39), § 4.13(199), § 5.11(236),
error bound„ 329
§ 5.12(240), § 7.10(309), § 13.4.1(548),
event, § 4.2(164), 164, § 4.11(186)
§ 13.4.2(550), § 13.4.5(556)
exclusive, § 4.3(166), § 4.10(185), § 4.11(186)
large, § 5.3(211)
exercise, § 1.13(39), § 2.12(76), § 4.9(183),
law, § 5.3(211)
§ 4.13(199), § 5.9(223), § 5.10(233), § 5.11(236),
Law of Large Numbers, 284
§ 5.12(240)
leaf, § 2.3(46)
exercises, § 4.12(197)
likelihood, § 1.3(15)
expected, § 5.3(211)
limit, § 13.4.2(550)
expected value, 211
location, § 2.5(53)
experiment, § 4.2(164), 164, § 5.1(209),
long term, § 4.2(164)
§ 5.5(214), § 5.11(236), § 5.12(240)
long-term, § 4.13(199), § 5.3(211)
exponential distribution, 287
INDEX
595
M mean, 16, § 2.5(53), 58, § 2.10(72), § 2.12(76),
§ 13.4.5(556)
§ 5.3(211), 211, 280, 282, 285, § 13.5.1(559)
proportion, § 1.4(15), 16, § 13.5.1(559),
means, § 13.5.2(560)
§ 13.5.2(560)
means square, § 12.3(506)
measurement, § 1.6(18)
Q Qualitative, § 1.5(17), § 1.9(25), § 1.11(29)
Measuring, § 1.5(17)
Qualitative data, 17
median, § 2.1(45), § 2.5(53), 58, § 2.10(72),
Quantitative, § 1.5(17), § 1.9(25), § 1.11(29)
§ 2.12(76)
Quantitative data, 17
mode, § 2.5(53), 60, § 2.10(72), § 2.12(76)
quartile, § 2.5(53), § 2.10(72), § 2.12(76)
modules, 1
quartiles, 53
multiplication, § 4.4(169)
R random, § 1.3(15), § 1.9(25), § 1.11(29),
mutually, § 4.3(166), § 4.10(185), § 4.11(186)
§ 1.13(39), § 5.1(209), § 5.2(210), § 5.3(211),
mutually exclusive, 167, 170
§ 5.4(214), § 5.5(214), § 5.6(218), § 5.7(220),
N Normal, § 14(585)
§ 5.9(223), § 5.10(233), § 5.11(236), § 5.12(240)
Normal Approximation to the Binomial, 289
random variable, 209, 421, 424
normal distribution, 326, 329, 370
randomness, § 1.3(15)
normally distributed, 280, 283, 371
relative, § 1.8(21), § 1.9(25), § 1.11(29),
null hypothesis, 371, 372, 373, § 12.2(506),
§ 1.12(37), § 2.12(76), § 4.2(164)
§ 12.3(506)
relative frequency, 21, 49
numbers, § 5.3(211)
replacement, § 1.9(25), § 4.13(199)
numerical, § 1.4(15)
representative, § 1.4(15)
resources, § (5)
O One-Way Analysis of Variance, § 12.1(505),
review, § 4.12(197), § 5.10(233)
§ 12.2(506), § 12.3(506)
round, § 1.7(20)
outcome, § 4.2(164), 164, § 4.3(166)
rounding, § 1.7(20)
outlier, 46, 54
rule, § 4.4(169)
outliers, 54, 121
S sample, § 1.4(15), 15, § 1.6(18), § 1.9(25),
P p-value, 372, 373, 373, 373, 375
§ 1.11(29), § 1.13(39), § 2.12(76), § 12.2(506),
pair, § 13.5.2(560)
§ 12.3(506), § 13.4.5(556)
parameter, § 1.4(15), 15, § 1.9(25), 317
Sample Means, § 7.2(280)
PDF, § 5.2(210)
sample space, § 4.2(164), 164, 169, 178
percentile, § 2.5(53), § 2.10(72), § 2.12(76)
samples, 18
percentiles, 53
Sampling, § 1.1(13), § 1.4(15), 15, § 1.6(18),
plot, § 2.10(72), § 2.12(76)
§ 1.9(25), § 1.10(26), § 1.11(29), § 1.12(37),
point estimate, 317
§ 1.13(39)
Poisson, § 5.4(214), § 5.6(218), § 5.9(223)
sampling distribution, 60
population, § 1.4(15), 15, 18, § 1.9(25),
sampling variability of a statistic, 65
§ 2.12(76), § 12.2(506), § 12.3(506)
set, § 13.3(545)
practice, § 1.10(26), § 2.10(72), § 4.9(183),
sheet, § 13.5.1(559), § 13.5.2(560), § 13.5.3(561)
§ 4.10(185), § 4.12(197), § 5.10(233)
simple, § 1.9(25)
probability, § 1.3(15), 15, § 1.9(25), § 4.1(163),
single, § 13.5.1(559)
§ 4.2(164), 164, § 4.3(166), § 4.4(169), § 4.5(173),
Sir Ronald Fisher, § 12.3(506)
§ 4.6(176), § 4.7(177), § 4.8(182), § 4.9(183),
size, § 1.6(18)
§ 4.10(185), § 4.11(186), § 4.12(197), § 4.13(199),
skew, § 2.5(53), § 12.4(508)
§ 5.2(210), § 5.3(211), § 5.4(214), § 5.5(214),
solution, § 13.5.1(559), § 13.5.2(560),
§ 5.6(218), § 5.7(220), § 5.10(233), § 5.11(236),
§ 13.5.3(561), § 13.5.4(562)
§ 5.12(240)
spread, § 2.5(53)
probability distribution function, 210
square, § 11.4(464), § 11.5(471)
problem, § 2.12(76), § 5.9(223), § 13.4.4(555)
standard, § 2.10(72), § 2.12(76)
project, § 13.4.1(548), § 13.4.2(550),
standard deviation, 63, 326, 370, 371, 372, 375
596
INDEX
standard error, 420
stemplot, § 2.3(46)
standard error of the mean., 281
stratified, § 1.9(25), § 1.13(39)
standard normal distribution, 252
Student’s-t distribution, 326, 371
statistic, § 1.4(15), 15, § 1.9(25), 61
student’s-t distribution., 370
statistics, § (1), § (5), § (9), § 1.1(13), § 1.2(13),
Student-t, § 14(585)
13, § 1.3(15), § 1.5(17), § 1.6(18), § 1.7(20),
sum of squares, § 12.3(506)
§ 1.8(21), § 1.9(25), § 1.10(26), § 1.11(29),
Sums, § 7.3(283)
§ 1.12(37), § 1.13(39), § 2.1(45), § 2.2(45),
supplemental, § (5)
§ 2.3(46), § 2.5(53), § 2.6(58), § 2.7(61),
survey, § 1.11(29), § 13.4.1(548), § 13.4.3(553)
§ 2.8(63), § 2.9(71), § 2.10(72), § 2.11(75),