Tools
Welcome to the Mississippi State University Virtual Statistics
Laboratory (VSL), creation of which was funded by a Schillig Special
Teaching Project Award. This website provides students with a very limited
introduction to testing hypotheses about means and step-by-step instructions
for performing a few basic tests using STATLETS, the academic version
of which version will analyze up to 100 rows by 10 columns of data and may be
downloaded free of charge. The VSL is intended for students who
are involved in research but who have not taken a formal course in statistics.
However, it is assumed that the student already has some familiarity with descriptive
statistics (e.g., mean and standard deviation).
The purpose of the VSL is to guide students through the process of selecting
an appropriate statistical model and using STATLETS to perform the analysis.
The goal is to provide students with a useful research tool while helping them
to avoid common mistakes. Statlets was specifically chosen for use in
the VSL because it provides substantial interpretation of results, and
help is available on-line.
The
VSL is not meant to replace a traditional course in statistics, and
consultation with a professional statistician is encouraged if results are to be
submitted for publication. Many of the links are to sections of Hyperstat Online, an
online statistics textbook, and I am grateful to Rice University for granting
permission to link to this resource.
-Dr. Steven H. Elder, Assistant
Professor of Agricultural & Biological Engineering
Throughout this tutorial it is assumed that you are conducting or planning an
experiment to test hypotheses about means. If you are
reluctant to use the mean and standard deviation as descriptive statistics
for your data, then you should consider using nonparametric methods,
which are beyond the scope of this tutorial. Discussion is also restricted
to experiments in which the dependent variable is continuous.
To get
started you should review Hyperstat Online,
Chapter 9: Logic of Hypothesis Testing. Once you have covered this
material, you are ready to decide what type of procedure is most appropriate for
your experiment. One assumption underlying the parametric procedures
presented herein is that your samples were selected at random from the
population of interest. Therefore, you should make sure that the condition
of random sampling is satisfied before proceeding.
We
will consider 4 different procedures for testing hypotheses about means
(assuming the data have been randomly drawn from the population(s) of
interest):/P>
- One Sample t-test
- Two Sample Independent
t-test
- Two Sample Paired t-test
- One-way ANOVA
- Two-way ANOVA
The first step in choosing the
appropriate test is to formulate your null hypothesis. The procedure to
use will depend on how many groups (means) are involved.
- If you are going to test a hypothesis about a
single population mean μ, then
you will use a one sample t-test. The null hypothesis is a value of
μ. For example, suppose a manufacturer claims that the mean bursting
pressure for a certain type and size of PVC irrigation tubing is 350
psi. The bursting pressures of ten such pipes are measured
experimentally. A one sample t-test can be used to test the null
hypothesis that μ = 350.
- If you want
to compare 2 means, then you should use an independent samples t-test
or a paired samples t-test.
- The two-sample t-tests are used
for testing differences between means. The null hypothesis is
that the difference between means is some specified value. Usually the null
hypothesis is that the difference is zero (μ1 = μ2). The choice between the independent and paired
procedures depends on the nature of the samples. Independent samples
have no relationship to one another, and there is no systematic way that a
member of one independent sample can be paired with a member of another
independent sample. If such a pairing is possible, the samples are
said to be dependent. Consider a researcher interested in the effects
of caffeine on standardized test performance among eleventh grade
students. He may give one group of 20 students 10 ounces of water to
drink before the test (sample 1), while he gives 10 ounces of coffee to a
different group of 20 (sample 2). In this case the samples are
independent. On the other hand, he may test the same 20 individuals
after drinking water on one day (sample 1) and after drinking coffee on a
different day (sample 2). In this case the samples are dependent,
since the water score for each individual can be paired with the coffee
score for that same individual. Click here to see the
advantage of the paired design over the independent samples design.
- If you want to test a hypothesis about
differences between more than two groups (means), an analysis of
variance (ANOVA) is best. When there are more than two means, it is
incorrect to compare each mean with each other mean using t-tests. The
null hypothesis takes
the form μ1 = μ2 = μ3 = ...μN, where N is the total number of groups. A factor in the ANOVA is
a variable, the effects of which are being studied. Factors also serve
as the basis for classifying data into categories. Factors are under the
control of the investigator. Studies with only one factor are called
one-way analyses. Categories of a factor into which data are
classified are called levels. When an
experiment involves two or more independent variables (factors), it is called
a factorial
design. Studies with exactly two independent variables are called
two-way analyses. For the sake of simplicity, we will consider
only the one-way and two-way ANOVA.
- If you are comparing the means
of more than two groups that differ on one independent variable
(factor), you should use the one-way ANOVA test. For example, one-way
ANOVA would be appropriate for a biologist who wants to compare four
different batches of serum on the proliferation of cultured cells. In
this study there is one systematically manipulated variable (factor) with
four levels. If you are investigating the effects of exactly 2
independent variables, each with two or more levels, you can use the two-way
ANOVA. For example, suppose the biologist comparing batches of serum
introduces a second variable, serum concentration, with 3 levels: low,
medium, and high. Examination of more than two independent variables
simultaneously requires a multifactorial ANOVA, which is beyond the
scope of the VSL. For the two-way ANOVA, it is assumed that the
designs are completely crossed, meaning that each level of one factor occurs
with (i.e., is "crossed" with) each level of all other factors. In the
example above, it would be assumed that each batch of serum is tested at
low, medium, and high concentration.
Before
proceeding with the analysis, it is important to verify that your data satisfy
the assumptions of the test you plan to use. In addition to random
sampling, some key assumptions for each test are listed below. As will be
explained in subsequent sections, STATLETS can often help you determine whether
your data satisfy the assumptions. If you are not sure about some of the
assumptions, proceed to the next section.
One Sample t-test
Two Sample Independent t-test
-
The populations are normally
distributed. See section below on how to test for normality using
Statlets. The t test is
robust to moderate departures
from normality. If the distributions are not extremely skewed, and
particularly if the numbers of observations are similar in the two groups,
then the t test will be valid.
-
Scores are independent (Each
subject provides only one score)
Two Sample Paired t-test
- Each subject is sampled independently from
each other subject.
- The distribution of
differences (between each pair) is approximately normal. If both
raw scores are normally distributed then the differences will be normally
distributed.
One-way ANOVA
- The treatment groups are independent. There can be no systematic
way of matching or pairing members of one group with members of any other
group.
- Each of the populations is normally distributed.
- Each population has the same
variance (s2), a
condition called homogeneity of variance. You can use STATLETS to test for
homogeneity of variance.
Research has shown that ANOVA is robust to violations of its assumptions.
It is preferable to use equal sample sizes whenever
possible. One situation to be wary of is a combination of unequal
sample sizes and a violation of the assumption of homogeneity of variance, in
which case the ANOVA may not accurately predict the probability
value.
Two-way
ANOVA
The
assumptions of the two-way ANOVA designs parallel those above for the one-way
designs. In addition, the two factors must be independent.
If you
have not already done so, click here to download the academic
version of Statlets. For instructions on how to create a new file, enter
data, etc., click here. To change the
default significance level (alpha) from 5%, click on the 'Edit' tab in the 'Data'
window and select 'Preferences.' Click the radio button next to the desired
confidence level (e.g., 99% to set a = 1%).
One Sample t-test
- Enter your sample data into a
single column on the data spreadsheet.
- Click on 'Analyze' from the menu
at the top of the window.
- Select 'One Sample' > 'One
Variable Analysis'
- On the 'Input' window that
automatically pops up, select the column containing your data and click on the
large right-pointing arrow to enter it into the 'Sample data:' field.
- Click on the 'Stats' tab to see
summary statistics (mean, standard deviation, etc.), as well as measures of
the shape of the distribution used to determine whether the sample comes from
a normal distribution. Read the 'Statistical Interpreter' for more
information.
- To check the normality
assumption, click on the 'P-Plot' tab to see a normal probability plot of your
data. Within the plot window, click on 'Interpret' for more
information. You need not be concerned unless your data display
substantial deviation from normality.
- Satisfied that your data come
from a normal distribution or that the violation of this assumption is not too
serious, click on the 't-test' tab. By default, the alternative hypothesis
is mean ¹ 0.
- Read the 'Statistical
Interpreter' for what decision to make regarding your null
hypothesis.
- If you want to change the
significance level or hypothesis, click on the 'Options' button in the upper
right corner.
- By convention, one usually uses
an alpha level of 10%, 5% or 1%. The choice of alpha-level reflects the
degree of "risk of being wrong" that one is willing to accept. Theoretically
speaking, an alpha of 5% means that if one were to hypothetically perform a
test 100 times, one would make the wrong conclusion in about 5 of these
tests. Arriving at the wrong conclusion in this context means that one has
concluded that the results are not due to chance, when in fact they
were. The 5% level is generally accepted as the highest error level
acceptable.
- Click here for information
on conducting directional test (i.e., choosing 'Less than' or 'Greater than'
for your alternative hypothesis.
- Click here for
descriptions of all other one variable analysis tab options.
Two Sample Independent
t-test
- Enter your data into two columns, each containing
a separate sample (group). You need not have the same number of entries
(rows) for each sample, but equal sample size is preferable.
- Click on 'Analyze' from the menu at the top of
the window.
- Select 'Two Sample Comparisons'
> 'Independent Samples'
- On the 'Input' window that
automatically pops up, select the groups (columns) containing your data and
click on the large right-pointing arrows to enter groups into the 'Sample 1:'
and 'Sample 2:' fields. It does not matter which group is Sample 1 and
which group is Sample 2.
- Click on the 'Stats' tab to see
summary statistics (mean, standard deviation, etc.), as well as measures of
the shape of the distributions used to determine whether the samples come from
normal distributions. Read the 'Statistical Interpreter' for more
information.
- To further evaluate the normality
assumption, click on the 'P-Plot' tab to see a normal probability plots of
your data. Within the plot window, click on 'Interpret' for more
information. You need not be concerned unless your data display
substantial deviation from normality.
- Satisfied that your data come
from normal distributions or that the violation of this assumption is not too
serious, click on the 't-test' tab. By default, the null hypothesis is
μ1 - μ2 = 0; the alternative hypothesis
is μ1 - μ2 ≠ 0;
and the analysis assumes equal sigmas (standard deviations) between the two
groups.
- Read the 'Statistical
Interpreter' for what decision to make regarding your null
hypothesis.
- If you want to change any of the
default settings, click on the 'Options' button in the upper right corner.
- If the standard deviations of
the two groups differ by more than a factor of 2, it is advisable not
to assume equal sigmas. After clicking on the 'Options' button in the
't test' window, deselect 'Assume equal sigmas' by clicking on the checkbox
and click 'OK' to rerun the analysis.
- By convention, one usually uses
an alpha level of 10%, 5% or 1%. The choice of alpha-level reflects the
degree of "risk of being wrong" that one is willing to accept. Theoretically
speaking, an alpha of 5% means that if one were to hypothetically perform a
test 100 times, one would make the wrong conclusion in about 5 of these
tests. Arriving at the wrong conclusion in this context means that one has
concluded that the results are not due to chance, when in fact they
were. The 5% level is generally accepted as the highest error level
acceptable.
- Click here for information
on conducting directional test (i.e., choosing 'Less than' or 'Greater than'
for your alternative hypothesis.
- Click here for
descriptions of all other two sample independent t-test tab options.
Two Sample Paired t-test
-
Perform the two sample
paired t-test as described above for independent samples, except select 'Two
Sample Comparisons > 'Paired Samples' from the 'Analyze' menu.
-
Click here for
descriptions of all other two sample paired t-test tab
options.
One-way ANOVA
Two-way
ANOVA
-
Enter the data for the response
(dependent) variable into the first column. In the second column, enter a
factor code (numeric or character) identifying the levels of the first factor.
In the third column, enter a factor code (numeric or character) identifying
the levels of the second factor. For example, the biologist
investigating the influence of serum batch and concentration on cell
proliferation would enter cell number data into the first column. An
integer indicating the batch of serum would be entered in the second
column. Concentration level would be indicated by 'high,' 'medium,' or
'low' entered in the third column. Remember that the design should be
fully crossed, so that all batches of serum are tested at all concentration
levels. Assuming 5 samples in each different testing condition, a
portion of the data spreadsheet might look like
this:
- Click on 'Model' from the menu at the top of the
window.
- Select 'Analysis of Variance'
> 'Twoway ANOVA'
- On the 'Input' window that
automatically pops up, select the column (variable) containing your response
data and click on the large right-pointing arrow to enter it into the 'Data'
field. Similarly enter the columns containing the independent variable
data into the 'Factor 1' and 'Factor 2' fields.
- To check the homogeneity of
variance assumption, click on the 'Resids vs level' tab, located using the
arrows in the upper right corner to scroll through all the tab options.
Click on the 'Interpret' button.
- To check the normality
assumption, click on the 'P-plot' tab and then click the 'Interpret'
button.
- Satisfied that your data do not
seriously violate the ANOVA assumptions, click on the ANOVA tab and read the
'Statistical Interpreter.'
- Click on the 'Options'
button and check the 'Estimate interactions' box, and click 'OK.'
- If the 'Statistical Interpreter'
indicates that either factor (or interaction of the two) has a significant
effect, you want to find out which means are significantly different from
which others. Click on the 'Range tests' tab to see a pairwise
comparison between all different groups.
- As you can see by clicking on
the 'Options' button, there are several types of multiple comparison
procedures. The most popular procedure is Tukey's HSD (Honestly
Significant Difference) procedure. Click the radio button next to
'Tukey HSD' and click on 'OK' to repeat the multiple range tests. The
pairwise comparison is performed on levels of the factor entered in the
'Factor 1:' field in the 'Input' window. To do the pairwise comparison
for levels of the other factor, swap the factors entered into 'Factor 1:'
and 'Factor 2:' and rerun the analysis. This will not affect the test
results.
- Note that you can also select a
different confidence level if desired (if you want to set alpha = 1%,
set the confidence level to 99%).
- Click on the 'Interaction plot'
tab to see how the two factors interact. Click on the 'Interpret'
button for an explanation.
- Click here for all other
Two-way ANOVA tab options.
In the
design of an experiment, sample size is an important issue that is often
neglected. If you are in the planning or early stages of your experiment,
you should consider the question, "What sample size is needed in order to detect
a difference (i.e., to see an effect) of a particular size?" Background
readings on this topic include an online article in
Computing News and the Hyperstat Online chapter on
Power. Having reviewed this material, you are now ready to use
Statlets to compute the sample size required in your study.
One Sample t test
This
procedure determines the sample size required to estimate the mean of a normal
distribution.
- Click on 'Analyze' in the 'Data'
window and select 'Sample Size Determination' > 'Normal Mean.'
- In the 'Hypothesized mean:'
field, enter the hypothesized value of the population mean (null
hypothesis).
- In the 'Assumed sigma:' field,
enter an estimate of the standard deviation.
- Click the radio button next to
'Power.'
- Enter the desired significance
level (e.g., 5%) in the 'Alpha risk:' field.
- In the 'Beta risk:' field, enter
the allowable type II error
probability. Remember that power = 1 -
b, so enter 10% if you want to achieve 90% power. Many
investigators strive to achieve at least 80% power.
- Enter an alternative hypothesis
in the 'At:' field. Statlets will calculate the sample size required to
have a (1 - b)% chance
of rejecting the null hypothesis when the alternative hypothesis is
true.
- Click in the Two-sided box if you
are planning a nondirectional
test.
- Click on the 'Sample Size' tab
and read the 'Statistical Interpreter.'
Two Sample t test
This
applet allows you to determine what sample size would be required to estimate
the difference between the means of two normal distributions with the desired
precision.
- Click on 'Analyze' in the 'Data'
window and select 'Sample Size Determination' > 'Comparison of Two
Means.'
- In the 'Hypothesized difference:'
field, enter the hypothesized value of the difference between population
means. For example, enter zero if you the null hypothesis is
μ1 = μ2.
- In the 'Assumed within-group
sigma:' field, enter an estimate of the assumed common standard deviation
within the samples.
- Click the radio button next to
'Power.'
- Enter the desired significance
level (e.g., 5%) in the 'Alpha risk:' field.
- In the 'Beta risk:' field, enter
the allowable type II error
probability. Remember that power = 1 - β, so
enter 10% if you want to achieve 90% power. Many investigators strive to
achieve at least 80% power.
- Enter an alternative hypothesis
in the 'At:' field. STATLETS will calculate the sample size required to
have a (1 - β)% chance of rejecting the null hypothesis when the alternative
hypothesis is true.
- Click in the Two-sided box if you
are planning a nondirectional test.
- Click on the 'Sample Size' tab
and read the 'Statistical Interpreter.'
One-way
ANOVA
This
applet allows you to determine what sample size would be required to estimate
the maximum difference between the means of many normal distributions with the
desired precision.
- Click on 'Analyze' in the 'Data'
window and select 'Sample Size Determination' > 'Comparison of Several
Means.'
- In the 'Hypothesized difference:'
and 'Assumed within-group sigma:' fields, enter the hypothesized difference
and the assumed common standard deviation within the samples,
respectively. If the null hypothesis is μ1 =
μ2 = ... = μN, enter zero for the hypothesized difference.
- In the 'Number of means:' field,
enter the number of groups (samples) in your experiment.
- Click the radio button next to
'Power.'
- Enter the desired significance
level (e.g., 5%) in the 'Alpha risk:' field.
- In the 'Beta risk:' field, enter
the allowable type II error
probability. Remember that power = 1 - β, so
enter 10% if you want to achieve 90% power. Many investigators strive to
achieve at least 80% power.
- Enter an alternative hypothesis
in the 'At:' field. STATLETS will calculate the sample size required to
have a (1 - β)% chance of rejecting the null hypothesis when the alternative
hypothesis is true.
- Click on the 'Sample Size' tab
and read the 'Statistical Interpreter.'
Two-way
ANOVA
Power
calculation for the two-way ANOVA design is not included in STATLETS. For
an online sample size calculator for Balanced ANOVAs (including two-way), click
here.
The
following practice questions are adapted from
Levine, David M., Ramsey, Patricia
P., and Smidt, Robert K. Applied Statistics for Engineers and Scientists,
Prentice Hall, Upper Saddle River, NJ, 2001.
1. A manufacturer is
developing a new cellular phone battery and wants to test its performance
against that of the standard nickel-cadmium battery. A random sample of 25
nickel-cadmium batteries and a random sample of 25 newly developed batteries are
placed in cellular phones of the same brand and model and the talking time (in
minutes) prior to recharging are recorded as follows:
Standard Nickel-Cadmium Battery |
Newly Developed Battery |
54.5 |
71.0 |
67.0 |
78.3 |
103.0 |
79.8 |
67.8 |
41.7 |
56.7 |
95.4 |
81.3 |
91.1 |
64.5 |
69.7 |
86.8 |
69.4 |
46.4 |
82.8 |
70.4 |
40.8 |
74.9 |
87.3 |
82.3 |
71.8 |
72.5 |
75.4 |
76.9 |
62.5 |
83.2 |
77.5 |
64.9 |
81.0 |
104.4 |
85.0 |
85.3 |
74.3 |
83.3 |
90.4 |
82.0 |
85.3 |
85.5 |
86.1 |
72.8 |
71.8 |
58.7 |
72.1 |
112.3 |
74.1 |
68.8 |
|
|
41.1 |
|
|
At the 5%
significance level, is there evidence of a difference between the two types of
batteries with respect to average talking time prior to recharge?
Ans. Using the two sample
independent t-test, there is evidence of a difference between the two types of
batteries (p-value = .0337 < .05).
2. A machine being used for packaging potato chips
is set so that, on average, 15 ounces of chips will be packaged per bag.
The quality control engineer wishes to test the machine setting and selects a
random sample of 30 packages filled during the production process. Their
weights are recorded as follows:
15.2 |
15.3 |
15.1 |
15.7 |
15.3 |
15.0 |
15.1 |
14.3 |
14.6 |
14.5 |
15.0 |
15.2 |
15.4 |
15.6 |
15.7 |
15.4 |
15.3 |
14.9 |
14.8 |
14.6 |
14.3 |
14.4 |
15.5 |
15.4 |
15.2 |
15.5 |
15.6 |
15.1 |
15.3 |
15.1 |
A the 5%
significance level, is there evidence that the mean weight per bag is different
from 15 ounces?
Ans. Using the one sample t-test, there is not enough evidence to
conclude that the average weight is not equal to 15 ounces.
3. A
sputtering machine is used for metalization on wafers in the semiconductor
industry. The reflectance of different sputtering machines (larger numbers
are better) was compared, with the following results:
Sputtering Machine |
A |
B |
C |
88.80 |
90.20 |
94.8 |
90.20 |
91.7 |
93.5 |
91.30 |
90.00 |
90.90 |
89.50 |
90.90 |
94.20 |
90.30 |
92.5 |
94.1 |
At the 5%
level of significance, is there evidence of a difference in the average
reflectance for the different machines? If your results indicate it is
appropriate, determine which machines differ in average reflectance.
Ans.
Using a one-way ANOVA, there is evidence of a difference in the average
reflectance between the machines (p-value = .00184 < .05).
Machine C has a significantly higher reflectance than the other
machines.
4. The following data concern the thickness of
nonmagnetic coatings of zinc. Two measurements
are made the same specimen, the first using
a nondestructive method, the second a destructive method. The data are as
follows (in mm):
Specimen |
Nondestructive Method |
Destructive Method |
1 |
105 |
116 |
2 |
120 |
132 |
3 |
85 |
104 |
4 |
181 |
139 |
5 |
115 |
114 |
6 |
127 |
129 |
7 |
630 |
720 |
8 |
155 |
174 |
9 |
25 |
312 |
10 |
310 |
338 |
11 |
443 |
465 |
At the 5%
significance level, is there evidence of a difference in the average measurement
of thickness between the two methods?
Ans.
Using the two sample paired t-test, do not reject the null hypothesis.
There is no evidence of a difference in the average thickness between the two
methods (p-value = .0765 > .05).
5. The quality-control
director for a clothing manufacturer wants to study the effect of operators and
machines on the breaking strength (in pounds) of wool serge material. A
batch of the material is cut into square-yard pieces and these are randomly
assigned, three each, to all 12 combinations of four operators and three
machines chosen specifically for the experiment. The results are as
follows:
| |
Machine |
Operator |
I |
II |
III |
A |
115 111 109 |
115 108 110 |
119 114 107 |
B |
117 105 105 |
114 102 113 |
114 106 114 |
C |
109 100 103 |
110 103 102 |
106 103 105 |
D |
112 105 108 |
115 107 111 |
111 107 110 |
For the
5% level of significance, is there an effect due to operator? Is there an
effect due to machine? Is there an interaction due to operator and
machine? If appropriate, use the Tukey procedure to determine which
operators and which machines differ in their effect on average breaking
strength. (Use a =
.05) What can you conclude about the effect of operators and machines on
breaking strength?
Ans.
According to the results of a two-way ANOVA, there is sufficient evidence to
conclude that the breaking strengths of wool serge material differ with which
operator ran the machine. There is also adequate evidence to conclude that
the breaking strengths differ among the three machines used to produce the
material. Furthermore, there is enough evidence to conclude that there is
an interaction between operator and machine (study the interaction plot for
details). With respect to breaking strength of wool serge material,
operator C is significantly different from all others, and each machine differs
significantly from the other two. To summarize, operators create fabric of
varying strengths depending on which machines individuals use.