logo Monarch Lab - Monarchs in the Classroom

 

 

 
Home > Monarch Research > Statistics  

An Exciting Journey Into the World Of Statistics
 
 


Two Sample t-Test  |  Chi-Square Test  |  Statistical Terms  |  What to Do with Data  |  Monarch Research


Two Sample t-Test

Purpose: To compare responses from two groups. These two groups can come from different experimental treatments, or different natural "populations".

Assumptions:

  • each group is considered to be a sample from a distinct population
  • the responses in each group are independent of those in the other group
  • the distributions of the variable of interest are normal

How it works:

  1. The null hypothesis is that the two population means are equal to each other. To test the null hypothesis, you need to calculate the following values: xs.gif (974 bytes)(the means of the two samples), s12, s22 (the variances of the two samples), n1, n2 (the sample sizes of the two samples), and k (the degrees of freedom)

T-test formula

  1. Compute the t-statistic.

T-test statistic

  1. Compare the calculated t-value, with k degrees of freedom, to the critical t value from the t distribution table at the chosen confidence level and decide whether to accept or reject the null hypothesis.

*Reject the null hypothesis when: calculated t-value > critical t-value

  1. Note: This procedure can be used when the distribution variances from the two populations are not equal, and the sample sizes are not equal.

 


Chi-square test
(also called Contingency Analysis)

Purpose: to measure the degree of disagreement between the observed data and the null hypothesis, use when both variables are CATEGORICAL and you want to know whether there is a correlation between them.

Assumptions:

  • there are n random samples or trials
  • there are c (and r) possible outcomes for each trial
  • the probabilities of the c (and r) outcomes remain the same between trials
  • the trials are independent
  • the sample size, n, is large enough so that for every cell, the expected cell count, E(n), will be > 1 (as with most statistical tests, large sample sizes yield more reliable results!)

How it works:

  1. 1 x c table: Suppose c = 3. If the null hypothesis is true then p(c1) = p(c2) = p(c3) = 1/3. If the null hypothesis is false, then at least one of the proportions exceeds 1/3 (a preference exists). Thus, our OBSERVED values are the data, while the EXPECTED values all equal n/c. (In this case, E(n) = n/3.)
  2. r x c table: Suppose c = 3 and r = 2. Make a data table that includes row totals and column totals. This data table contains the OBSERVED values. We can use the row and column totals to calculate the data values we would expect to get if there were no correlations between the variables. The expected values are equal to the ratio of the product of the row total and column total to the total number of samples:

E(nrc) = (row total)(column total)
                  (total sample size)

    We can use this formula to make a second table to hold the expected values:

  1. Calculate the test statistic, c2, as follows:

    Chi-square test         

  1. Compare the calculated c2 value, with c – 1 degrees of freedom for a table with only 1 row and (c – 1)(r – 1) degrees of freedom for a table with 2 or more rows, to the critical c2 value from the Chi-square distribution table at the chosen level of significance and decide whether to accept the null hypothesis. The farther the observed numbers are from their expected values, the larger c2 will become. Therefore, large values of c2 imply that the null hypothesis is false.

* Reject the null hypothesis when: calculated c2 value > critical c2 value

 


Statistical Terms

categorical variable: a variable for which each value falls into one of a set of groups (e.g. gender, political party, plant species, type of behavior)

confidence level: the probability, expressed as a percentage, that a confidence interval encloses the population parameter (We can be 95% confident that this interval encloses the actual population parameter.)

continuous variable: a variable that can assume values corresponding to any of the points contained in one or more intervals (e.g. height, weight, time)

correlation: a relationship between 2 variables

dependent or response variable: a variable of interest to be measured in an experiment, we usually are interested in determining the effect of one or more independent variables on the response variable

independent variable: a predictor variable, one which is not being affected by other variables in the experiment (e.g. in a food choice study, the type of food would be the independent variable and the amount eaten would be the response variable)

mean: the sum of the measurements divided by the number of measurement contained in the data set (average)

median: the middle number when the measurements are arranged in ascending or descending order

normal distribution: a bell-shaped probability distribution

null hypothesis: 1(statistics) the hypothesis that is being falsified by a specified statistical test (usually that the values being tested are equal);

random sample: elements selected from a population such that every set of n elements in the population has an equal probability of being selected

range: the largest measurement minus the smallest measurement

standard deviation: the square root of the variance

statistic: a number calculated from a sample of observed data to make an inference about the population to which the sample belongs

statistically significant: implies that you have used statistical methods, which account for means and variances, to conclude that your measurements for different populations or treatments are different

statistics: the science of data – collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical information

variance: the sum of the squared distances from the mean divided by (n – 1)2 (ecology) the predicted outcome based on the assumption that the resulting pattern is what would occur in the absence of the hypothesized ecological proccess (essentially, a prediction of ‘no difference’)


Now That I Have Data, What Do I Do With It?

  1. Make sure your data are well-organized, either into a notebook or a spreadsheet program.
  1. Conduct Exploratory Data Analysis (EDA)
  • EDA is the process of exploring patterns in your data through graphics, before doing any formal statistics. It’s a "first pass" through your data.
  • EDA is a useful method to help you decide what statistical tests are appropriate
  • When you do EDA, try graphing your data several different ways. You may find patterns that you didn’t expect. EDA can be done with a computer or by hand. You can try bar graphs, scatterplots, line graphs, and any other graphs that might help you visualize patterns in your data.
  1. Use the flow chart to help you decide which statistical test to use to test your hypotheses. Remember that you need to make sure your data fit the assumptions of any test you decide to use. For example, if you are going to use a t test, your data should be normally distributed.
  1. Utilize all possible resources: math teachers, scientists, statistics books, Internet sites, etc. Most biologists get some help when it comes to statistical analyses. A good web page to use as a starting point for stats help on the Internet is www.stat.ufl.edu/vlib/statistics.html . This page has listings for many other sites that deal with statistics. Some of these sites are geared to undergraduates and other statistics beginners. Others are more advanced.

Key to statistical tests


Back to Top  |  Monarch Research  |  Site Overview