Statistitics
MSExcel - Statistical Functions
In the practical class, you should become proficient at using the appropriate windows in Tools, Data Analysis to perform the following statistical functions:
- Create frequency histograms of continuously variable data sets.
- Perform a t-test to determine if there is a significant difference between two sets of data.
- Prepare a scatterplot.
- Calculate a correlation coefficient.
- Calculate a chi-square.
(N.B. These analysis tools are not always loaded when you install Excel, so to use them on your home computer you may need to do a Custom Install from the original software CD.)
1. Create a frequency histogram of continuously variable data
Once you have opened or created a file of data, you need to create a list of values that you want plotted on the x axis of your graph. This is called the Bin range. e.g. for height values ranging from 120 cm to 200 cm you might choose x axis values of 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200. You need to create this list in another column on the worksheet containing your data.
(For a long list you can do this quickly using a calculator that adds a set value to the value above, then copying and pasting this function down as many rows as you need.)
Select Tools, Data Analysis. Scroll down the list, select Histogram and click OK. A new window will appear and you need to enter the following information:
Input range: click in this box then use the mouse to highlight the data you wish to plot. (Use click and drag, or click on the first value in the list, hold down the mouse button while you press Shift then click on the last value in the list.) The cell range will automatically appear in the box.
Bin range: click in this box then highlight the bin range values you entered on your worksheet.
Click on New worksheet ply, enter a name for the new sheet if you wish, click on Chart output then OK.
The histogram will be plotted in a new worksheet on the screen. The Bin values and their frequencies will be displayed in a table at the left hand side of the graph. (To return to your data, click on the appropriate tab at the bottom of the workbook.)
Refer to Help, Contents and Index, Analysing Statistical Data, Histogram analysis tool for further details.
2. Perform a t-test
Select Tools, Data Analysis. Scroll down the list, select t-test: Two sample assuming unequal variances and click OK. A new window will appear and you need to enter the following information:
Variable 1 range: highlight the column of data that are the values for variable 1. (Click on the first value in the list, hold down the mouse button while you press Shift then click on the last value in the list.) The cell range will automatically appear in the box. If you include the column heading in this range, tick the Labels box.
Variable 2 range: highlight the column of data that are the values for variable 2. If you include the column heading in this range, tick the Labels box.
Hypothesized mean difference: Enter 0.
Output range: click in this box then select a region on your worksheet where you want the data table displayed. You can do this by clicking on a single cell, which will become the top left cell of the table.
The table will include the mean and variance for each variable, degrees of freedom, the t-statistic for the data, the critical t and probability value for the one-tailed and two-tailed test. You should look at the two-tailed probability value. If this is less than 0.05, the difference between the means is significant.
Refer to Help, Contents and Index, Analysing Statistical Data, Perform a t-Test analysis for further details.
3. Prepare a scatter plot
Highlight the two columns of data you wish to plot. Make sure the x axis data is in the first column. Click on Chart Wizard, select XY (Scatter) and click on Next. The plot will appear in the Chart Wizard window. This window allows you to select the range of the data, which will automatically be included if you have previously highlighted the data. Click on Next, adjust the chart title if necessary and enter titles (AND units!) for each axis, in the Value (X) Axis and Value (Y) Axis boxes. You can choose to remove gridlines and the legend in the appropriate windows. The changes will be displayed in the preview window. Click on Next when you have finished in this window. Select As new sheet and enter a name for this worksheet if you wish. Click on Finish and the plot will be displayed in a new worksheet in your workbook. (You can view different sheets by clicking on the appropriate tab at the bottom of the workbook.)
You can adjust the origin point on any axis by clicking on the axis then selecting Format, Selected Axis. (Alternatively, hold down Control key, click on the axis and select Format Axis from the pop-up menu.) Select Scale and set the minimum value to the point you want at the start of that axis. Repeat for the second axis if necessary. Experiment with changing the Font and Number style of the axis.
4. Calculate a correlation coefficient
If you have measured two variables in a group of individuals, such as foot-length and height, you can calculate how closely the variables are correlated with each other.
Select Tools, Data Analysis. Scroll down the list, select Correlation and click OK. A new window will appear and you need to enter the following information:
Input range: highlight the two columns of data that are the paired values for the two variables. The cell range will automatically appear in the box. If you include the column headings in this range, tick the Labels box.
Output range: click in this box then select a region on your worksheet where you want the data table displayed. You can do this by clicking on a single cell, which will become the top left cell of the table.
Click OK and a table will be displayed showing the correlation coefficient (r) for the data. (Ignore the values of 1 that give the correlation coefficient of each variable relative to itself!) You can test whether the correlation is significant by examining Table 1. Critical values of correlation coefficients at the 5% significance level, using (n - 2) degrees of freedom. (n = the number of rows of data, e.g. 2 columns of data with 10 rows each gives df = 8) If the observed r is greater than the critical value, the two variables are significantly correlated.
(Note that this does not mean they are dependent - to determine a cause and effect relationship requires regression analysis.)
Refer to Help, Contents and Index, Analysing Statistical Data, Correlation analyisis tool and formulas for further details.
5. Calculate a chi-square
Using your knowledge of Excel workbooks, you should be able to set up a table with calculators to very quickly calculate a chi-square statistic.
Prepare a table with table of your data with the top and bottom rows as follows:
Specimen |
Measured value |
Expected value |
(O-E)^2 |
c ^{2} = S
(O-E)^2 |
Refer to Table 2. Critical values of the chi-square statistic (c 2) at the 5% significance level to determine whether the observed values for your sample vary significantly from the expected values. If the calculated chi-square is greater than the critical value at (n - 1) degrees of freedom, the difference is significant.
Table 1. Critical values of correlation coefficients at the 5% significance level.
Degrees of freedom |
critical value of r at 5% level |
Degrees of freedom |
critical value of r at 5% level |
Degrees of freedom |
critical value of r at 5% level |
Degrees of freedom |
critical value of r at 5% level |
1 |
.997 |
13 |
.514 |
25 |
.381 |
80 |
.217 |
2 |
.950 |
14 |
.497 |
26 |
.374 |
90 |
.205 |
3 |
.878 |
15 |
.482 |
27 |
.367 |
100 |
.195 |
4 |
.811 |
16 |
.468 |
28 |
.361 |
125 |
.174 |
5 |
.754 |
17 |
.456 |
29 |
.355 |
150 |
.159 |
6 |
.707 |
18 |
.444 |
30 |
.349 |
200 |
.138 |
7 |
.666 |
19 |
.433 |
35 |
.325 |
300 |
.113 |
8 |
.632 |
20 |
.423 |
40 |
.304 |
400 |
.098 |
9 |
.602 |
21 |
.413 |
45 |
.288 |
500 |
.088 |
10 |
.576 |
22 |
.404 |
50 |
.273 |
1000 |
.062 |
11 |
.553 |
23 |
.396 |
60 |
.250 |
||
12 |
.532 |
24 |
.388 |
70 |
.232 |
Table 2. Critical values of the chi-square statistic (c 2) at the 5% significance level.
Degrees of freedom |
critical value of c ^{2} at 5% level |
Degrees of freedom |
critical value of c ^{2} at 5% level |
Degrees of freedom |
critical value of c ^{2} at 5% level |
1 |
3.84 |
11 |
19.68 |
21 |
32.67 |
2 |
5.99 |
12 |
21.03 |
22 |
33.92 |
3 |
7.81 |
13 |
22.36 |
23 |
35.17 |
4 |
9.49 |
14 |
23.68 |
24 |
36.42 |
5 |
11.07 |
15 |
25.00 |
25 |
37.65 |
6 |
12.59 |
16 |
26.30 |
26 |
38.89 |
7 |
14.07 |
17 |
27.59 |
27 |
40.11 |
8 |
15.51 |
18 |
28.87 |
28 |
31.34 |
9 |
16.92 |
19 |
30.14 |
29 |
42.56 |
10 |
18.31 |
20 |
31.41 |
30 |
43.77 |