Tools
Virtual Statistics Laboratory

Welcome to the Mississippi State University Virtual Statistics Laboratory (VSL), creation of which was funded by a Schillig Special Teaching Project Award.  This website provides students with a very limited introduction to testing hypotheses about means and step-by-step instructions for performing a few basic tests using STATLETS, the academic version of which version will analyze up to 100 rows by 10 columns of data and may be downloaded free of charge.  The VSL is intended for students who are involved in research but who have not taken a formal course in statistics.  However, it is assumed that the student already has some familiarity with descriptive statistics (e.g., mean and standard deviation).  The purpose of the VSL is to guide students through the process of selecting an appropriate statistical model and using STATLETS to perform the analysis.  The goal is to provide students with a useful research tool while helping them to avoid common mistakes.  Statlets was specifically chosen for use in the VSL because it provides substantial interpretation of results, and help is available on-line. 

The VSL is not meant to replace a traditional course in statistics, and consultation with a professional statistician is encouraged if results are to be submitted for publication.  Many of the links are to sections of Hyperstat Online, an online statistics textbook, and I am grateful to Rice University for granting permission to link to this resource.

 

-Dr. Steven H. Elder, Assistant Professor of Agricultural & Biological Engineering

Background

            Throughout this tutorial it is assumed that you are conducting or planning an experiment to test hypotheses about means.  If you are reluctant to use the mean and standard deviation as descriptive statistics for your data, then you should consider using nonparametric methods, which are beyond the scope of this tutorial.  Discussion is also restricted to experiments in which the dependent variable is continuous.   

To get started you should review Hyperstat Online, Chapter 9: Logic of Hypothesis Testing.  Once you have covered this material, you are ready to decide what type of procedure is most appropriate for your experiment.  One assumption underlying the parametric procedures presented herein is that your samples were selected at random from the population of interest.  Therefore, you should make sure that the condition of random sampling is satisfied before proceeding. 

Choosing the Right Test

            We will consider 4 different procedures for testing hypotheses about means (assuming the data have been randomly drawn from the population(s) of interest):/P>

  • One Sample t-test
  • Two Sample Independent t-test
  • Two Sample Paired t-test
  • One-way ANOVA
  • Two-way ANOVA

The first step in choosing the appropriate test is to formulate your null hypothesis.  The procedure to use will depend on how many groups (means) are involved.

  • If you are going to test a hypothesis about a single population mean μ, then you will use a one sample t-test.  The null hypothesis is a value of μ.  For example, suppose a manufacturer claims that the mean bursting pressure for a certain type and size of PVC irrigation tubing is 350 psi.  The bursting pressures of ten such pipes are measured experimentally.  A one sample t-test can be used to test the null hypothesis that μ = 350.
  • If you want to compare 2 means, then you should use an independent samples t-test or a paired samples t-test.
    • The two-sample t-tests are used for testing differences between means.  The null hypothesis is that the difference between means is some specified value. Usually the null hypothesis is that the difference is zero (μ1 = μ2).  The choice between the independent and paired procedures depends on the nature of the samples.  Independent samples have no relationship to one another, and there is no systematic way that a member of one independent sample can be paired with a member of another independent sample.  If such a pairing is possible, the samples are said to be dependent.  Consider a researcher interested in the effects of caffeine on standardized test performance among eleventh grade students.  He may give one group of 20 students 10 ounces of water to drink before the test (sample 1), while he gives 10 ounces of coffee to a different group of 20 (sample 2).  In this case the samples are independent.  On the other hand, he may test the same 20 individuals after drinking water on one day (sample 1) and after drinking coffee on a different day (sample 2).  In this case the samples are dependent, since the water score for each individual can be paired with the coffee score for that same individual.  Click here to see the advantage of the paired design over the independent samples design.
  • If you want to test a hypothesis about differences between more than two groups (means), an analysis of variance (ANOVA) is best.  When there are more than two means, it is incorrect to compare each mean with each other mean using t-tests. The null hypothesis takes the form μ1 = μ2 = μ3 = ...μN, where N is the total number of groups.  A factor in the ANOVA is a variable, the effects of which are being studied.  Factors also serve as the basis for classifying data into categories.  Factors are under the control of the investigator.  Studies with only one factor are called one-way analyses.  Categories of a factor into which data are classified are called levels.  When an experiment involves two or more independent variables (factors), it is called a factorial design.  Studies with exactly two independent variables are called two-way analyses.  For the sake of simplicity, we will consider only the one-way and two-way ANOVA.
    • If you are comparing the means of more than two groups that differ on one independent variable (factor), you should use the one-way ANOVA test.  For example, one-way ANOVA would be appropriate for a biologist who wants to compare four different batches of serum on the proliferation of cultured cells.  In this study there is one systematically manipulated variable (factor) with four levels.  If you are investigating the effects of exactly 2 independent variables, each with two or more levels, you can use the two-way ANOVA.  For example, suppose the biologist comparing batches of serum introduces a second variable, serum concentration, with 3 levels: low, medium, and high.  Examination of more than two independent variables simultaneously requires a multifactorial ANOVA, which is beyond the scope of the VSL.  For the two-way ANOVA, it is assumed that the designs are completely crossed, meaning that each level of one factor occurs with (i.e., is "crossed" with) each level of all other factors.  In the example above, it would be assumed that each batch of serum is tested at low, medium, and high concentration. 


Check Assumptions

Before proceeding with the analysis, it is important to verify that your data satisfy the assumptions of the test you plan to use.  In addition to random sampling, some key assumptions for each test are listed below.  As will be explained in subsequent sections, STATLETS can often help you determine whether your data satisfy the assumptions.  If you are not sure about some of the assumptions, proceed to the next section. 

 

One Sample t-test

 

Two Sample Independent t-test

  • The populations are normally distributed.  See section below on how to test for normality using Statlets. The t test is robust to moderate departures from normality.  If the distributions are not extremely skewed, and particularly if the numbers of observations are similar in the two groups, then the t test will be valid.   

  • Scores are independent (Each subject provides only one score)

Two Sample Paired t-test

  •     Each subject is sampled independently from each other subject.
  •     The distribution of differences (between each pair) is approximately normal.  If both raw scores are normally distributed then the differences will be normally distributed.

 

One-way ANOVA

  • The treatment groups are independent.  There can be no systematic way of matching or pairing members of one group with members of any other group.

  • Each of the populations is normally distributed
  • Each population has the same variance (s2), a condition called homogeneity of variance. You can use STATLETS to test for homogeneity of variance.

 

            Research has shown that ANOVA is robust to violations of its assumptions.  It is preferable to use equal sample sizes whenever possible.   One situation to be wary of is a combination of unequal sample sizes and a violation of the assumption of homogeneity of variance, in which case the ANOVA may not accurately predict the probability value.

Two-way ANOVA

            The assumptions of the two-way ANOVA designs parallel those above for the one-way designs.  In addition, the two factors must be independent

Entering Data and Testing Hypotheses using Statlets

            If you have not already done so, click here to download the academic version of Statlets.  For instructions on how to create a new file, enter data, etc., click here.  To change the default significance level (alpha) from 5%, click on the 'Edit' tab in the 'Data' window and select 'Preferences.'  Click the radio button next to the desired confidence level (e.g., 99% to set a = 1%). 

One Sample t-test

  • Enter your sample data into a single column on the data spreadsheet.
  • Click on 'Analyze' from the menu at the top of the window.
  • Select 'One Sample' > 'One Variable Analysis'
  • On the 'Input' window that automatically pops up, select the column containing your data and click on the large right-pointing arrow to enter it into the 'Sample data:' field.
  • Click on the 'Stats' tab to see summary statistics (mean, standard deviation, etc.), as well as measures of the shape of the distribution used to determine whether the sample comes from a normal distribution.  Read the 'Statistical Interpreter' for more information.
  • To check the normality assumption, click on the 'P-Plot' tab to see a normal probability plot of your data.  Within the plot window, click on 'Interpret' for more information.  You need not be concerned unless your data display substantial deviation from normality.
  • Satisfied that your data come from a normal distribution or that the violation of this assumption is not too serious, click on the 't-test' tab.  By default, the alternative hypothesis is mean ¹ 0.
    • Read the 'Statistical Interpreter' for what decision to make regarding your null hypothesis.
  • If you want to change the significance level or hypothesis, click on the 'Options' button in the upper right corner.
    • By convention, one usually uses an alpha level of 10%, 5% or 1%. The choice of alpha-level reflects the degree of "risk of being wrong" that one is willing to accept. Theoretically speaking, an alpha of 5% means that if one were to hypothetically perform a test 100 times, one would make the wrong conclusion in about 5 of these tests. Arriving at the wrong conclusion in this context means that one has concluded that the results are not due to chance, when in fact they were.  The 5% level is generally accepted as the highest error level acceptable.
    • Click here for information on conducting directional test (i.e., choosing 'Less than' or 'Greater than' for your alternative hypothesis.
  • Click here for descriptions of all other one variable analysis tab options.

 

Two Sample Independent t-test

  • Enter your data into two columns, each containing a separate sample (group).  You need not have the same number of entries (rows) for each sample, but equal sample size is preferable.
  • Click on 'Analyze' from the menu at the top of the window.
  • Select 'Two Sample Comparisons' > 'Independent Samples'
  • On the 'Input' window that automatically pops up, select the groups (columns) containing your data and click on the large right-pointing arrows to enter groups into the 'Sample 1:' and 'Sample 2:' fields.  It does not matter which group is Sample 1 and which group is Sample 2. 
  • Click on the 'Stats' tab to see summary statistics (mean, standard deviation, etc.), as well as measures of the shape of the distributions used to determine whether the samples come from normal distributions.  Read the 'Statistical Interpreter' for more information.

 

  • To further evaluate the normality assumption, click on the 'P-Plot' tab to see a normal probability plots of your data.  Within the plot window, click on 'Interpret' for more information.  You need not be concerned unless your data display substantial deviation from normality.
  • Satisfied that your data come from normal distributions or that the violation of this assumption is not too serious, click on the 't-test' tab.  By default, the null hypothesis is μ1 - μ2 = 0; the alternative hypothesis is μ1 - μ2 ≠ 0; and the analysis assumes equal sigmas (standard deviations) between the two groups.
    • Read the 'Statistical Interpreter' for what decision to make regarding your null hypothesis.
  • If you want to change any of the default settings, click on the 'Options' button in the upper right corner.
    • If the standard deviations of the two groups differ by more than a factor of 2, it is advisable not to assume equal sigmas.  After clicking on the 'Options' button in the 't test' window, deselect 'Assume equal sigmas' by clicking on the checkbox and click 'OK' to rerun the analysis.
    • By convention, one usually uses an alpha level of 10%, 5% or 1%. The choice of alpha-level reflects the degree of "risk of being wrong" that one is willing to accept. Theoretically speaking, an alpha of 5% means that if one were to hypothetically perform a test 100 times, one would make the wrong conclusion in about 5 of these tests. Arriving at the wrong conclusion in this context means that one has concluded that the results are not due to chance, when in fact they were.  The 5% level is generally accepted as the highest error level acceptable.
    • Click here for information on conducting directional test (i.e., choosing 'Less than' or 'Greater than' for your alternative hypothesis.

    • Click here for descriptions of all other two sample independent t-test tab options.

     

 

Two Sample Paired t-test

  • Perform the two sample paired t-test as described above for independent samples, except select 'Two Sample Comparisons > 'Paired Samples' from the 'Analyze' menu.

  • Click here for descriptions of all other two sample paired t-test tab options.

 

 

 

One-way ANOVA

  • Enter your data into the spreadsheet such that each column contains the data from a single group (sample).  An equal number of entries (rows) in each group is desirable but not necessary.

  • Click on 'Analyze' from the menu at the top of the window.
  • Select 'Multiple Samples' > 'Completely Randomized Design'
  • On the 'Input' window that automatically pops up, select the groups (columns) containing your data and click on the large right-pointing arrow to enter groups into the 'Samples' field.  Be sure to enter all the groups you want to compare.
  • To check the homogeneity of variance assumption, click on the 'Variance check' tab, located using the arrows in the upper right corner to scroll through all the tab options.  Read the 'Statistical Interpreter.'
  • To check the normality assumption, click on the 'P-plot' tab and then click the 'Interpret' button.
  • Satisfied that your data do not seriously violate the ANOVA assumptions, click on the ANOVA tab and read the 'Statistical Interpreter.'   By default, this test is performed at the 5% significance level.
  • If there is a statistically significant difference between the means of the various groups, you want to find out which means are significantly different from which others.  Click on the 'Range tests' tab to see a pairwise comparison between all different groups.
    • As you can see by clicking on the options button, there are several types of multiple comparison procedures.  In a one-way analysis of variance with equal group sizes, the most popular procedure is Tukey's HSD (Honestly Significant Difference) procedure.  Click the radio button next to 'Tukey HSD' and click on 'OK' to repeat the multiple range tests.
    • Note that you can also select a different confidence level if desired (you are basically setting 1-a, so the default is 95%).  
  • Click here for all other Completely Randomized ANOVA tab options.

 

 

Two-way ANOVA
  • Enter the data for the response (dependent) variable into the first column. In the second column, enter a factor code (numeric or character) identifying the levels of the first factor. In the third column, enter a factor code (numeric or character) identifying the levels of the second factor.  For example, the biologist investigating the influence of serum batch and concentration on cell proliferation would enter cell number data into the first column.  An integer indicating the batch of serum would be entered in the second column.  Concentration level would be indicated by 'high,' 'medium,' or 'low' entered in the third column.  Remember that the design should be fully crossed, so that all batches of serum are tested at all concentration levels.  Assuming 5 samples in each different testing condition, a portion of the data spreadsheet might look like this:

 

 

  • Click on 'Model' from the menu at the top of the window.
  • Select 'Analysis of Variance' > 'Twoway ANOVA'
  • On the 'Input' window that automatically pops up, select the column (variable) containing your response data and click on the large right-pointing arrow to enter it into the 'Data' field.  Similarly enter the columns containing the independent variable data into the 'Factor 1' and 'Factor 2' fields. 
  • To check the homogeneity of variance assumption, click on the 'Resids vs level' tab, located using the arrows in the upper right corner to scroll through all the tab options.  Click on the 'Interpret' button. 
  • To check the normality assumption, click on the 'P-plot' tab and then click the 'Interpret' button.
  • Satisfied that your data do not seriously violate the ANOVA assumptions, click on the ANOVA tab and read the 'Statistical Interpreter.'
  • Click on the 'Options' button  and check the 'Estimate interactions' box, and click 'OK.' 
  • If the 'Statistical Interpreter' indicates that either factor (or interaction of the two) has a significant effect, you want to find out which means are significantly different from which others.  Click on the 'Range tests' tab to see a pairwise comparison between all different groups.
    • As you can see by clicking on the 'Options' button, there are several types of multiple comparison procedures.  The most popular procedure is Tukey's HSD (Honestly Significant Difference) procedure.  Click the radio button next to 'Tukey HSD' and click on 'OK' to repeat the multiple range tests.  The pairwise comparison is performed on levels of the factor entered in the 'Factor 1:' field in the 'Input' window.  To do the pairwise comparison for levels of the other factor, swap the factors entered into 'Factor 1:' and 'Factor 2:' and rerun the analysis.  This will not affect the test results. 
    • Note that you can also select a different confidence level if desired (if you want to set alpha = 1%, set the confidence level to 99%).
    • Click on the 'Interaction plot' tab to see how the two factors interact.  Click on the 'Interpret' button for an explanation. 
  • Click here for all other Two-way ANOVA tab options.

 

 

Sample Size

            In the design of an experiment, sample size is an important issue that is often neglected.  If you are in the planning or early stages of your experiment, you should consider the question, "What sample size is needed in order to detect a difference (i.e., to see an effect) of a particular size?"  Background readings on this topic include an online article in Computing News and the Hyperstat Online chapter on Power.  Having reviewed this material, you are now ready to use Statlets to compute the sample size required in your study. 

 

One Sample t test

This procedure determines the sample size required to estimate the mean of a normal distribution.

 

  • Click on 'Analyze' in the 'Data' window and select 'Sample Size Determination' > 'Normal Mean.'
  • In the 'Hypothesized mean:' field, enter the hypothesized value of the population mean (null hypothesis).
  • In the 'Assumed sigma:' field, enter an estimate of the standard deviation.
  • Click the radio button next to 'Power.'
  • Enter the desired significance level (e.g., 5%) in the 'Alpha risk:' field.
  • In the 'Beta risk:' field, enter the allowable type II error probability.  Remember that power = 1 - b, so enter 10% if you want to achieve 90% power.  Many investigators strive to achieve at least 80% power.
  • Enter an alternative hypothesis in the 'At:' field.  Statlets will calculate the sample size required to have a (1 - b)% chance of rejecting the null hypothesis when the alternative hypothesis is true.
  • Click in the Two-sided box if you are planning a nondirectional test.  
  • Click on the 'Sample Size' tab and read the 'Statistical Interpreter.'

 

Two Sample t test

This applet allows you to determine what sample size would be required to estimate the difference between the means of two normal distributions with the desired precision.

 

  • Click on 'Analyze' in the 'Data' window and select 'Sample Size Determination' > 'Comparison of Two Means.'
  • In the 'Hypothesized difference:' field, enter the hypothesized value of the difference between population means.  For example, enter zero if you the null hypothesis is μ1 = μ2.
  • In the 'Assumed within-group sigma:' field, enter an estimate of the assumed common standard deviation within the samples.
  • Click the radio button next to 'Power.'
  • Enter the desired significance level (e.g., 5%) in the 'Alpha risk:' field.
  • In the 'Beta risk:' field, enter the allowable type II error probability.  Remember that power = 1 - β, so enter 10% if you want to achieve 90% power.  Many investigators strive to achieve at least 80% power.
  • Enter an alternative hypothesis in the 'At:' field.  STATLETS will calculate the sample size required to have a (1 - β)% chance of rejecting the null hypothesis when the alternative hypothesis is true.   
  • Click in the Two-sided box if you are planning a nondirectional test.
  • Click on the 'Sample Size' tab and read the 'Statistical Interpreter.'

 

 

One-way ANOVA

This applet allows you to determine what sample size would be required to estimate the maximum difference between the means of many normal distributions with the desired precision.

  • Click on 'Analyze' in the 'Data' window and select 'Sample Size Determination' > 'Comparison of Several Means.'
  • In the 'Hypothesized difference:' and 'Assumed within-group sigma:' fields, enter the hypothesized difference and the assumed common standard deviation within the samples, respectively.  If the null hypothesis is μ1 = μ2 = ... = μN, enter zero for the hypothesized difference.
  • In the 'Number of means:' field, enter the number of groups (samples) in your experiment.
  • Click the radio button next to 'Power.'
  • Enter the desired significance level (e.g., 5%) in the 'Alpha risk:' field.
  • In the 'Beta risk:' field, enter the allowable type II error probability.  Remember that power = 1 - β, so enter 10% if you want to achieve 90% power.  Many investigators strive to achieve at least 80% power.
  • Enter an alternative hypothesis in the 'At:' field.  STATLETS will calculate the sample size required to have a (1 - β)% chance of rejecting the null hypothesis when the alternative hypothesis is true.   
  • Click on the 'Sample Size' tab and read the 'Statistical Interpreter.'

 

Two-way ANOVA

            Power calculation for the two-way ANOVA design is not included in STATLETS.  For an online sample size calculator for Balanced ANOVAs (including two-way), click here.

 

 

Self Test

            The following practice questions are adapted from

 

Levine, David M., Ramsey, Patricia P., and Smidt, Robert K. Applied Statistics for Engineers and Scientists, Prentice Hall, Upper Saddle River, NJ, 2001.

 

    1.      A manufacturer is developing a new cellular phone battery and wants to test its performance against that of the standard nickel-cadmium battery.  A random sample of 25 nickel-cadmium batteries and a random sample of 25 newly developed batteries are placed in cellular phones of the same brand and model and the talking time (in minutes) prior to recharging are recorded as follows:

Standard Nickel-Cadmium Battery
Newly Developed Battery
54.5
71.0
67.0
78.3
103.0
79.8
67.8
41.7
56.7
95.4
81.3
91.1
64.5
69.7
86.8
69.4
46.4
82.8
70.4
40.8
74.9
87.3
82.3
71.8
72.5
75.4
76.9
62.5
83.2
77.5
64.9
81.0
104.4
85.0
85.3
74.3
83.3
90.4
82.0
85.3
85.5
86.1
72.8
71.8
58.7
72.1
112.3
74.1
68.8
41.1

 

At the 5% significance level, is there evidence of a difference between the two types of batteries with respect to average talking time prior to recharge?

 

Ans. Using the two sample independent t-test, there is evidence of a difference between the two types of batteries (p-value = .0337 < .05).

 

    2.      A machine being used for packaging potato chips is set so that, on average, 15 ounces of chips will be packaged per bag.  The quality control engineer wishes to test the machine setting and selects a random sample of 30 packages filled during the production process.  Their weights are recorded as follows:

 


15.2
15.3
15.1
15.7
15.3
15.0
15.1
14.3
14.6
14.5
15.0
15.2
15.4
15.6
15.7
15.4
15.3
14.9
14.8
14.6
14.3
14.4
15.5
15.4
15.2
15.5
15.6
15.1
15.3
15.1

 

A the 5% significance level, is there evidence that the mean weight per bag is different from 15 ounces?

 

Ans. Using the one sample t-test, there is not enough evidence to conclude that the average weight is not equal to 15 ounces.

 

3.      A sputtering machine is used for metalization on wafers in the semiconductor industry.  The reflectance of different sputtering machines (larger numbers are better) was compared, with the following results:

 

Sputtering Machine
A
B
C
88.80
90.20
94.8
90.20
91.7
93.5
91.30
90.00
90.90
89.50
90.90
94.20
90.30
92.5
94.1

 

 

 

At the 5% level of significance, is there evidence of a difference in the average reflectance for the different machines?  If your results indicate it is appropriate, determine which machines differ in average reflectance.

 

Ans. Using a one-way ANOVA, there is evidence of a difference in the average reflectance between the machines (p-value = .00184 < .05).  Machine C has a significantly higher reflectance than the other machines.

 

 

    4.      The following data concern the thickness of nonmagnetic coatings of zinc.  Two measurements are       made the same specimen, the first using a nondestructive method, the second a destructive method.  The data are as follows (in mm):

 

Specimen
Nondestructive Method
Destructive Method
1
105
116
2
120
132
3
85
104
4
181
139
5
115
114
6
127
129
7
630
720
8
155
174
9
25
312
10
310
338
11
443
465

 

At the 5% significance level, is there evidence of a difference in the average measurement of thickness between the two methods?

 

Ans. Using the two sample paired t-test, do not reject the null hypothesis.  There is no evidence of a difference in the average thickness between the two methods (p-value = .0765 > .05).

 

    5.      The quality-control director for a clothing manufacturer wants to study the effect of operators and machines on the breaking strength (in pounds) of wool serge material.  A batch of the material is cut into square-yard pieces and these are randomly assigned, three each, to all 12 combinations of four operators and three machines chosen specifically for the experiment.  The results are as follows:

 

  Machine
Operator
I
II
III
A
115
111
109
115
108
110
119
114
107
B
117
105
105
114
102
113
114
106
114
C
109
100
103
110
103
102
106
103
105
D
112
105
108
115
107
111
111
107
110

 

 

For the 5% level of significance, is there an effect due to operator?  Is there an effect due to machine? Is there an interaction due to operator and machine?  If appropriate, use the Tukey procedure to determine which operators and which machines differ in their effect on average breaking strength. (Use a = .05)  What can you conclude about the effect of operators and machines on breaking strength?

 

Ans. According to the results of a two-way ANOVA, there is sufficient evidence to conclude that the breaking strengths of wool serge material differ with which operator ran the machine.  There is also adequate evidence to conclude that the breaking strengths differ among the three machines used to produce the material.  Furthermore, there is enough evidence to conclude that there is an interaction between operator and machine (study the interaction plot for details).  With respect to breaking strength of wool serge material, operator C is significantly different from all others, and each machine differs significantly from the other two.  To summarize, operators create fabric of varying strengths depending on which machines individuals use.