Matlab Training Session 12: Statistics II. Course Website: http://www.queensu.ca/neurosci/Matlab Training Sessions.htm. Course Outline Term 1 Introduction to Matlab and its Interface Fundamentals (Operators) Fundamentals (Flow) Importing Data Functions and M-Files
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Matlab Training Session 12:Statistics II
Course Outline
Term 1
Term 2
9. Term 1 review
10. Loading Binary Data
11. Nonlinear Curve Fitting
12. Statistical Tools in Matlab II
13.
14.
Week 12 Lecture Outline
Statistics II
D. Anovas
Week 12 Lecture Outline
Required Toolboxes:
Statistics Toolbox
Week 12 Lecture Outline
Statistics II
Part A: Basic Matlab Statistics Review
Mean: Average or mean value of a distribution
Median: Middle value of a sorted distribution
M = mean(A), M = median(A)
M = mean(A,dim), M = median(A,dim)
M = mean(A), M = median(A): Returns the mean or median value of vector A.
If A is a multidimensional mean/median returns an array of mean values.
Example:
A = [ 0 2 5 7 20]B = [1 2 3
3 3 6
4 6 8
4 7 7];
mean(A) = 6.8
mean(B) = 3.0000 4.5000 6.0000 (column-wise mean)
mean(B,2) = 2.0000 4.0000 6.0000 6.0000 (row-wise mean)
Examples:
A = [ 0 2 5 7 20]B = [1 2 3
3 3 6
4 6 8
4 7 7];
Mean:
mean(A) = 6.8
mean(B) = 3.0 4.5 6.0 (column-wise mean)
mean(B,2) = 2.0 4.0 6.0 6.0 (row-wise mean)
Median:
median(A) = 5
median(B) = 3.5 4.5 6.5 (column-wise median)
median(B,2) = 2.0
3.0
6.0
7.0 (row-wise median)
Week 12 Lecture Outline
Statistics II
Part B: Parametric and Non-parametric statistical tests
[H,P] = ttest2(X,Y)
Determines whether the means from matrices X and Y are statistically different.
H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same)
P will return the significance level
[H,P] = ttest2(X,Y)
Determines whether the means from matrices X and Y are statistically different.
H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same)
P will return the significance level
Example:
For the data from Week 8
exercise 3
[H,P] = ttest2(var1,var2)
>> [H,P] = ttest2(var1,var2)
H =1
P = 0.00000000000014877
Variable 1
Variable 2
ranksum(X,Y) statistically compares the means of two data distributions X and Y
Example:
For the data from week 8
exercise 3
[P,H] = ranksum(var1,var2)
P = 1.1431e-014
H = 1
Variable 1
Variable 2
Week 12 Lecture Outline
Statistics II
Part C: Simple Statistical Plotting
Example:
[n, bin] = histc(x, binrange)
x = statistical distribution
binrange = the range of bins to plot eg: [1:1:10]
n = the number of elements in each bin from vector x
bin = the bin number each element of x belongs
Example:
>> test = round(rand(100,1)*10)
>> histc(test,[1:1:10])
>> Bar(test)
Example:
% add outlier to test distribution
>>test(101) = 16
>>boxplot(test)
+
Example:
>>test2 = test * (rand*10)
>>boxplot([test test2],1)
Week 12 Lecture Outline
Statistics II
D. Anovas
Terminology
Null Hypothesis = Both Means are the same
Type I error:
Reject Null Hypothesis when it is true. Eg Means are not actually significantly when p < 0.05
Type II error:
Accept Null Hypothesis when it is false. Eg means are actually significantly different when p > 0.05
Beta
Probability of making type II Error
Alpha
Probability of making type I Error
P < 0.05
Terminology
Family Wise Error:
The probability of making at least 1 family wise error while making multiple ANOVA comparisons
The matlab function anova1 calculates a 1 way anova
p = anova1(X) performs a balanced 1-way ANOVA comparing the means of the columns of data in the matrix X
** each column must represent an independent sample containing m mutually independent observations.
The function returns the p-value for the null hypothesis
p = anova1(X,group)
group = Each row of group contains the data label for the corresponding column of X
Assumptions
All sample populations are normally distributed
All sample populations have equal variance
All observations are mutually independent
The ANOVA test is known to be robust to modest violations of the first two assumptions.
The ANOVA table has six columns:
Example 1
The following example comes from a study of the material strength of structural beams in Hogg (1987). The vector strength measures the deflection of a beam in thousandths of an inch under 3,000 pounds of force. Stronger beams deflect less. The civil engineer performing the study wanted to determine whether the strength of steel beams was equal to the strength of two more expensive alloys.
Example 1
Steel is coded 'st' in the vector alloy. The other materials are coded 'al1' and 'al2'. S
strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...
79 77 78 82 79];
alloy = {'st','st','st','st','st','st','st','st',...
'al1','al1','al1','al1','al1','al1',...
'al2','al2','al2','al2','al2','al2'};
Though alloy is sorted in this example, you do not need to sort the grouping variable.
Solution:
p = anova1(strength,alloy)
p =
1.5264e-004
The p-value indicates that the three alloys are significantly different. The box plot confirms this graphically and shows that the steel beams deflect more than the more expensive alloys.
If a 1 way anova test indicates a significant difference between at least on mean:
Post Hoc Comparisons: The decision to compare means after a significant 1 way anova is caluculated. When all possible comparisons are made after the fact the changes of type 1 error become high.
A Priori Comparisons: Comparisons decided upon before the 1 way anova is performed based on the general theory of the study. This minimizes possible type I error.
Unrelated (Between Groups) Design
p = anovan(X,group) performs a balanced or unbalanced mult way ANOVA for comparing the means of the observations in vector X with respect to N different factors.
Related (Repeated Measures) Design
NOT IMPLEMENTED IN THE STATISTICS TOOLBOX!!