Collaborative Statistics by Robert Gallagher - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Median

A number that separates ordered data into halves. Half the values are the same number or

smaller than the median and half the values are the same number or larger than the median.

The median may or may not be part of the data.

Mode

The value that appears most frequently in a set of data.

Mutually Exclusive

An observation cannot fall into more than one class (category). Being in more than one category

prevents being in a mutually exclusive category.

N Normal Distribution

A continuous random variable (RV) with pdf f(x) =

1

e−(x− µ)2/2 2

σ , where µ is the mean of

σ

2 π

the distribution and σ is the standard deviation. Notation: X ∼ N ( µ, σ). If µ = 0 and σ = 1, the RV is called the standard normal distribution.

O Outcome (observation)

A particular result of an experiment.

Outlier

An observation that does not fit the rest of the data.

P p-value

The probability that an event will happen purely by chance assuming the null hypothesis is true.

The smaller the p-value, the stronger the evidence is against the null hypothesis.

Parameter

A numerical characteristic of the population.

Percentile

A number that divides ordered data into hundredths.

590

GLOSSARY

Example: Let a data set contain 200 ordered observations starting with {2.3, 2.7, 2.8, 2.9, 2.9, 3.0...}.

Then the first percentile is (2.7+2.8) = 2.75, because 1% of the data is to the left of this point on

2

the number line and 99% of the data is on its right. The second percentile is (2.9+2.9) = 2.9.

2

Percentiles may or may not be part of the data. In this example, the first percentile is not in the

data, but the second percentile is. The median of the data is the second quartile and the 50th

percentile. The first and third quartiles are the 25th and the 75th percentiles, respectively.

Point Estimate

A single number computed from a sample and used to estimate a population parameter.

Population

The collection, or set, of all individuals, objects, or measurements whose properties are being

studied.

Probability

A number between 0 and 1, inclusive, that gives the likelihood that a specific event will occur.

The foundation of statistics is given by the following 3 axioms (by A. N. Kolmogorov, 1930’s):

Let S denote the sample space and A and B are two events in S . Then:

• 0 ≤ P (A) ≤ 1;.

• If A and B are any two mutually exclusive events, then P (AorB) = P (A) + P (B).

• P (S) = 1.

Probability Distribution Function (PDF)

A mathematical description of a discrete random variable (RV), given either in the form of an

equation (formula) , or in the form of a table listing all the possible outcomes of an experiment

and the probability associated with each outcome.

Example: A biased coin with probability 0.7 for a head (in one toss of the coin) is tossed 5 times.

We are interested in the number of heads (the RV X = the number of heads). X is Binomial, so

5

X ∼ B (5, 0.7) and P (X = x) = 

 .7x .35−x or in the form of the table:

x

x

P (X = x)

0

0.0024

1

0.0284

2

0.1323

3

0.3087

4

0.3602

5

0.1681

Table 5.3

Proportion

• As a number: A proportion is the number of successes divided by the total number in the

sample.

• As a probability distribution: Given a binomial random variable (RV), X ∼B (n, p), consider

the ratio of the number X of successes in n Bernouli trials to the number n of trials. P’ = X .

n

This new RV is called a proportion, and if the number of trials, n, is large enough, P’

∼N p, pq .

n

GLOSSARY

591

Q Qualitative Data

See Data.

Quantitative Data

Quartiles

The numbers that separate the data into quarters. Quartiles may or may not be part of the data.

The second quartile is the median of the data.

R Random Variable (RV)

see Variable

Relative Frequency

The ratio of the number of times a value of the data occurs in the set of all outcomes to the

number of all outcomes.

S Sample

A portion of the population understudy. A sample is representative if it characterizes the

population being studied.

Sample Space

The set of all possible outcomes of an experiment.

Standard Deviation

A number that is equal to the square root of the variance and measures how far data values are

from their mean. Notation: s for sample standard deviation and σ for population standard

deviation.

Standard Error of the Mean

The standard deviation of the distribution of the sample means, σ

√ .

n

Standard Normal Distribution

A continuous random variable (RV) X~N (0, 1) .. When X follows the standard normal

distribution, it is often noted as Z~N (0, 1).

Statistic

A numerical characteristic of the sample. A statistic estimates the corresponding population

parameter. For example, the average number of full-time students in a 7:30 a.m. class for this

term (statistic) is an estimate for the average number of full-time students in any class this term

(parameter).

Student’s-t Distribution

Investigated and reported by William S. Gossett in 1908 and published under the pseudonym

Student. The major characteristics of the random variable (RV) are:

• It is continuous and assumes any real values.

• The pdf is symmetrical about its mean of zero. However, it is more spread out and flatter at

the apex than the normal distribution.

• It approaches the standard normal distribution as n gets larger.

• There is a "family" of t distributions: every representative of the family is completely

defined by the number of degrees of freedom which is one less than the number of data.

Student-t Distribution

592

GLOSSARY

T Tree Diagram

The useful visual representation of a sample space and events in the form of a “tree” with

branches marked by possible outcomes simultaneously with associated probabilities

(frequencies, relative frequencies).

Type 1 Error

The decision is to reject the Null hypothesis when, in fact, the Null hypothesis is true.

Type 2 Error

The decision is to not reject the Null hypothesis when, in fact, the Null hypothesis is false.

U Uniform Distribution

A continuous random variable (RV) that has equally likely outcomes over the domain,

a < x < b. Often referred as the Rectangular distribution because the graph of the pdf has the

form of a rectangle. Notation: X~U (a, b). The mean is µ = a+b and the standard deviation is

2

σ =

(b−a)2 The probability density function is f (x) = 1 for a < x < b or a ≤ x ≤ b. The

12

b−a

cumulative distribution is P (X ≤ x) = x−a .

b−a

V Variable (Random Variable)

A characteristic of interest in a population being studied. Common notation for variables are

upper case Latin letters X, Y, Z,...; common notation for a specific value from the domain (set of

all possible values of a variable) are lower case Latin letters x, y, z,.... For example, if X is the

number of children in a family, then x represents a specific integer 0, 1, 2, 3, .... Variables in

statistics differ from variables in intermediate algebra in two following ways.

• The domain of the random variable (RV) is not necessarily a numerical set; the domain may

be expressed in words; for example, if X = hair color then the domain is {black, blond, gray,

green, orange}.

• We can tell what specific value x of the Random Variable X takes only after performing the

experiment.

Variance

Mean of the squared deviations from the mean. Square of the standard deviation. For a set of

data, a deviation can be represented as x − x where x is a value of the data and x is the sample

mean. The sample variance is equal to the sum of the squares of the deviations divided by the

difference of the sample size and 1.

Venn Diagram

The visual representation of a sample space and events in the form of circles or ovals showing

their intersections.

Z z-score

The linear transformation of the form z = x− µ . If this transformation is applied to any normal

σ

distribution X~N ( µ, σ) , the result is the standard normal distribution Z~N (0, 1). If this

transformation is applied to any specific value x of the RV with mean µ and standard deviation

σ , the result is called the z-score of x. Z-scores allow us to compare data that are normally

distributed but scaled differently.

INDEX

593

Index of Keywords and Terms

Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords

do not necessarily appear in the text of the page. They are merely associated with that section. Ex.

apples, § 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1

"

"hypothesis testing.", 367

conditional probability, 165

confidence interval, 318, 326

A A AND B, § 4.2(164)

confidence intervals, 329, 367

A OR B, § 4.2(164)

confidence level, 319, 329

accessibility, § (5)

contingency, § 4.5(173), § 4.9(183)

addition, § 4.4(169)

contingency table, 173, 471

additional, § (5)

continuity correction factor, 289

adoption, § (5)

Continuous, § 1.5(17), 17, § 1.9(25), § 1.11(29),

alternate hypothesis, § 12.2(506), § 12.3(506)

§ 13.4.2(550)

ANOVA, § 12.1(505), § 12.2(506), 506,

convenience, § 1.9(25)

§ 12.3(506), § 12.4(508), § 13.5.4(562)

Counting, § 1.5(17)

answer, § 1.7(20)

critical value, 257

appendix, § 13.3(545)

cumulative, § 1.8(21), § 1.9(25), § 1.11(29),

article, § 13.4.3(553)

§ 1.12(37)

average, § 1.4(15), § 5.3(211)

Cumulative relative frequency, 21

B

curve, § 12.4(508)

bar, § 2.4(49)

Bernoulli, § 5.5(214), § 5.9(223)

D Data, § 1.1(13), § 1.2(13), 13, § 1.4(15), 16,

Bernoulli Trial, 215

§ 1.5(17), § 1.6(18), § 1.9(25), § 1.10(26),

binomial, § 5.4(214), § 5.5(214), § 5.6(218),

§ 1.11(29), § 1.12(37), § 2.1(45), § 2.2(45),

§ 5.9(223)

§ 2.4(49), § 13.3(545), § 13.4.1(548),

binomial distribution, 371

§ 13.4.5(556), § 13.5.2(560)

binomial probability distribution, 215

degrees of freedom, 326, § 12.3(506),

bivariate, § 13.4.5(556)

§ 12.4(508)

box, § 2.10(72), § 2.12(76)

degrees of freedom (df), 421

boxes, § 2.4(49)

descriptive, § 1.2(13), § 2.2(45), § 2.3(46),

C

§ 2.5(53), § 2.10(72), § 2.12(76)

cards, § 5.11(236)

deviation, § 2.10(72), § 2.12(76)

categorical, § 1.4(15)

diagram, § 4.6(176), § 4.7(177)

center, § 2.5(53)

dice, § 5.12(240)

central, § 13.4.2(550)

Discrete, § 1.5(17), 17, § 1.9(25), § 1.11(29),

Central Limit Theorem, § 7.2(280), § 7.3(283),

§ 5.1(209), § 5.2(210), § 5.3(211), § 5.4(214),

§ 7.10(309), 377

§ 5.5(214), § 5.6(218), § 5.7(220), § 5.9(223),

chance, § 4.2(164), § 4.3(166)

§ 5.10(233), § 5.11(236), § 5.12(240)

chi, § 11.4(464), § 11.5(471)

display, § 2.2(45)

chi-square, § 13.5.3(561), § 14(585)

distribution, § 5.1(209), § 5.2(210), § 5.5(214),

CLT, 284

§ 5.7(220), § 5.10(233), § 5.11(236), § 5.12(240),

cluster, § 1.9(25), § 1.13(39)

§ 11.5(471), § 13.4.2(550), § 13.5.3(561)

collaborative, § (1), § (5)

distribution is binomial, 329

collection, 1

dot plot, § 1.2(13)

condition, § 4.3(166)

conditional, § 4.2(164), § 4.11(186)

E elementary, § (1), § (5), § (9), § 2.1(45),

594

INDEX

§ 2.2(45), § 2.3(46), § 2.5(53), § 2.6(58), § 2.7(61),

F f, § 14(585)

§ 2.8(63), § 2.9(71), § 2.10(72), § 2.11(75),

F Distribution, § 12.1(505), § 12.2(506),

§ 2.13(93), § 3.1(103), § 3.2(103), § 3.3(105),

§ 12.3(506), § 12.4(508), § 13.5.4(562)

§ 3.4(106), § 3.5(108), § 3.6(114), § 3.7(116),

F Ratio, § 12.3(506)

§ 3.8(121), § 3.9(121), § 3.10(128), § 3.11(131),

fit, § 11.4(464)

§ 3.12(132), § 3.13(135), § 3.14(150), § 3.15(153),

formula, § 4.8(182), § 5.6(218)

§ 3.16(155), § 4.4(169), § 4.5(173), § 4.6(176),

frequency, § 1.8(21), 21, § 1.9(25), § 1.10(26),

§ 4.7(177), § 4.8(182), § 4.9(183), § 4.10(185),

§ 1.11(29), § 1.12(37), § 1.13(39), 49, § 2.10(72),

§ 4.11(186), § 4.12(197), § 4.13(199), § 5.1(209),

§ 2.12(76), § 4.2(164)

§ 5.2(210), § 5.3(211), § 5.4(214), § 5.5(214),

function, § 5.2(210), § 5.4(214), § 5.5(214),

§ 5.6(218), § 5.7(220), § 5.9(223), § 5.10(233),

§ 5.6(218), § 5.10(233)

§ 5.11(236), § 5.12(240), § 6.1(251), § 6.2(252),

§ 6.3(253), § 6.4(255), § 6.5(255), § 6.6(259),

G geometric, § 5.4(214), § 5.6(218), § 5.9(223)

§ 6.7(260), § 6.8(262), § 6.9(268), § 6.10(270),

good, § 11.4(464)

§ 6.11(273), § 7.1(279), § 7.4(284), § 7.5(292),

graph, § 2.2(45), § 2.3(46), § 13.4.1(548)

§ 7.6(293), § 7.7(296), § 7.8(303), § 7.9(305),

guide, § (5)

§ 8.1(317), § 8.2(319), § 8.3(326), § 8.4(329),

H histogram, § 2.4(49), § 2.10(72), § 2.12(76)

§ 8.5(334), § 8.6(335), § 8.7(337), § 8.8(339),

Homework, § 1.11(29), § 2.10(72), § 2.12(76),

§ 8.9(341), § 8.10(351), § 8.11(354), § 8.12(357),

§ 4.9(183), § 4.12(197), § 4.13(199), § 5.9(223),

§ 8.13(359), § 9.1(367), § 9.2(368), § 9.3(369),

§ 5.10(233), § 5.11(236), § 5.12(240)

§ 9.4(370), § 9.5(371), § 9.6(371), § 9.7(372),

hypergeometric, § 5.4(214), § 5.9(223)

§ 9.8(373), § 9.9(373), § 9.10(375), § 9.11(375),

hypergeometrical, § 5.6(218)

§ 9.12(386), § 9.13(387), § 9.14(389), § 9.15(391),

hypotheses, 368

§ 9.16(393), § 9.17(406), § 9.18(409), § 10.1(419),

hypothesis, § 13.4.3(553), § 13.4.4(555),

§ 10.2(420), § 10.3(423), § 10.4(425), § 10.5(427),

§ 13.5.1(559), § 13.5.2(560)

§ 10.6(432), § 10.7(433), § 10.8(435), § 10.9(437),

hypothesis test, 371, 373, 375, § 12.2(506),

§ 10.10(449), § 10.11(451), § 11.1(461),

§ 12.3(506)

§ 11.2(462), § 11.3(462), § 11.4(464), § 11.5(471),

§ 11.6(476), § 11.7(477), § 11.8(479), § 11.9(481),

I

independence, § 11.5(471)

§ 11.10(489), § 11.11(493), § 11.12(498),

independent, § 4.3(166), 166, 170, § 4.10(185),

§ 12.5(513), § 12.6(514), § 12.7(516), § 12.8(518),

§ 4.11(186)

§ 12.9(522), § 13.1(527), § 13.2(536), § 13.3(545),

inferential, § 1.2(13)

§ 13.4.1(548), § 13.4.2(550), § 13.4.3(553),

inferential statistics, 317

§ 13.4.4(555), § 13.4.5(556), § 13.5.1(559),

interquartile range, 54

§ 13.5.2(560), § 13.5.3(561), § 13.5.4(562),

Introduction, § 1.1(13), § 4.1(163), § 5.1(209)

§ 13.6(563), § 13.7(564), § 13.8(569)

IQR, § 2.5(53)

elementary statistics, § (11), § 2.12(76)

equally likely, § 4.2(164), 164

K key terms, § 4.2(164)

error bound for a population mean, 319, 327

L lab, § 1.13(39), § 4.13(199), § 5.11(236),

error bound„ 329

§ 5.12(240), § 7.10(309), § 13.4.1(548),

event, § 4.2(164), 164, § 4.11(186)

§ 13.4.2(550), § 13.4.5(556)

exclusive, § 4.3(166), § 4.10(185), § 4.11(186)

large, § 5.3(211)

exercise, § 1.13(39), § 2.12(76), § 4.9(183),

law, § 5.3(211)

§ 4.13(199), § 5.9(223), § 5.10(233), § 5.11(236),

Law of Large Numbers, 284

§ 5.12(240)

leaf, § 2.3(46)

exercises, § 4.12(197)

likelihood, § 1.3(15)

expected, § 5.3(211)

limit, § 13.4.2(550)

expected value, 211

location, § 2.5(53)

experiment, § 4.2(164), 164, § 5.1(209),

long term, § 4.2(164)

§ 5.5(214), § 5.11(236), § 5.12(240)

long-term, § 4.13(199), § 5.3(211)

exponential distribution, 287

INDEX

595

M mean, 16, § 2.5(53), 58, § 2.10(72), § 2.12(76),

§ 13.4.5(556)

§ 5.3(211), 211, 280, 282, 285, § 13.5.1(559)

proportion, § 1.4(15), 16, § 13.5.1(559),

means, § 13.5.2(560)

§ 13.5.2(560)

means square, § 12.3(506)

measurement, § 1.6(18)

Q Qualitative, § 1.5(17), § 1.9(25), § 1.11(29)

Measuring, § 1.5(17)

Qualitative data, 17

median, § 2.1(45), § 2.5(53), 58, § 2.10(72),

Quantitative, § 1.5(17), § 1.9(25), § 1.11(29)

§ 2.12(76)

Quantitative data, 17

mode, § 2.5(53), 60, § 2.10(72), § 2.12(76)

quartile, § 2.5(53), § 2.10(72), § 2.12(76)

modules, 1

quartiles, 53

multiplication, § 4.4(169)

R random, § 1.3(15), § 1.9(25), § 1.11(29),

mutually, § 4.3(166), § 4.10(185), § 4.11(186)

§ 1.13(39), § 5.1(209), § 5.2(210), § 5.3(211),

mutually exclusive, 167, 170

§ 5.4(214), § 5.5(214), § 5.6(218), § 5.7(220),

N Normal, § 14(585)

§ 5.9(223), § 5.10(233), § 5.11(236), § 5.12(240)

Normal Approximation to the Binomial, 289

random variable, 209, 421, 424

normal distribution, 326, 329, 370

randomness, § 1.3(15)

normally distributed, 280, 283, 371

relative, § 1.8(21), § 1.9(25), § 1.11(29),

null hypothesis, 371, 372, 373, § 12.2(506),

§ 1.12(37), § 2.12(76), § 4.2(164)

§ 12.3(506)

relative frequency, 21, 49

numbers, § 5.3(211)

replacement, § 1.9(25), § 4.13(199)

numerical, § 1.4(15)

representative, § 1.4(15)

resources, § (5)

O One-Way Analysis of Variance, § 12.1(505),

review, § 4.12(197), § 5.10(233)

§ 12.2(506), § 12.3(506)

round, § 1.7(20)

outcome, § 4.2(164), 164, § 4.3(166)

rounding, § 1.7(20)

outlier, 46, 54

rule, § 4.4(169)

outliers, 54, 121

S sample, § 1.4(15), 15, § 1.6(18), § 1.9(25),

P p-value, 372, 373, 373, 373, 375

§ 1.11(29), § 1.13(39), § 2.12(76), § 12.2(506),

pair, § 13.5.2(560)

§ 12.3(506), § 13.4.5(556)

parameter, § 1.4(15), 15, § 1.9(25), 317

Sample Means, § 7.2(280)

PDF, § 5.2(210)

sample space, § 4.2(164), 164, 169, 178

percentile, § 2.5(53), § 2.10(72), § 2.12(76)

samples, 18

percentiles, 53

Sampling, § 1.1(13), § 1.4(15), 15, § 1.6(18),

plot, § 2.10(72), § 2.12(76)

§ 1.9(25), § 1.10(26), § 1.11(29), § 1.12(37),

point estimate, 317

§ 1.13(39)

Poisson, § 5.4(214), § 5.6(218), § 5.9(223)

sampling distribution, 60

population, § 1.4(15), 15, 18, § 1.9(25),

sampling variability of a statistic, 65

§ 2.12(76), § 12.2(506), § 12.3(506)

set, § 13.3(545)

practice, § 1.10(26), § 2.10(72), § 4.9(183),

sheet, § 13.5.1(559), § 13.5.2(560), § 13.5.3(561)

§ 4.10(185), § 4.12(197), § 5.10(233)

simple, § 1.9(25)

probability, § 1.3(15), 15, § 1.9(25), § 4.1(163),

single, § 13.5.1(559)

§ 4.2(164), 164, § 4.3(166), § 4.4(169), § 4.5(173),

Sir Ronald Fisher, § 12.3(506)

§ 4.6(176), § 4.7(177), § 4.8(182), § 4.9(183),

size, § 1.6(18)

§ 4.10(185), § 4.11(186), § 4.12(197), § 4.13(199),

skew, § 2.5(53), § 12.4(508)

§ 5.2(210), § 5.3(211), § 5.4(214), § 5.5(214),

solution, § 13.5.1(559), § 13.5.2(560),

§ 5.6(218), § 5.7(220), § 5.10(233), § 5.11(236),

§ 13.5.3(561), § 13.5.4(562)

§ 5.12(240)

spread, § 2.5(53)

probability distribution function, 210

square, § 11.4(464), § 11.5(471)

problem, § 2.12(76), § 5.9(223), § 13.4.4(555)

standard, § 2.10(72), § 2.12(76)

project, § 13.4.1(548), § 13.4.2(550),

standard deviation, 63, 326, 370, 371, 372, 375

596

INDEX

standard error, 420

stemplot, § 2.3(46)

standard error of the mean., 281

stratified, § 1.9(25), § 1.13(39)

standard normal distribution, 252

Student’s-t distribution, 326, 371

statistic, § 1.4(15), 15, § 1.9(25), 61

student’s-t distribution., 370

statistics, § (1), § (5), § (9), § 1.1(13), § 1.2(13),

Student-t, § 14(585)

13, § 1.3(15), § 1.5(17), § 1.6(18), § 1.7(20),

sum of squares, § 12.3(506)

§ 1.8(21), § 1.9(25), § 1.10(26), § 1.11(29),

Sums, § 7.3(283)

§ 1.12(37), § 1.13(39), § 2.1(45), § 2.2(45),

supplemental, § (5)

§ 2.3(46), § 2.5(53), § 2.6(58), § 2.7(61),

survey, § 1.11(29), § 13.4.1(548), § 13.4.3(553)

§ 2.8(63), § 2.9(71), § 2.10(72), § 2.11(75),