Matlab training session 12 statistics ii
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Matlab Training Session 12: Statistics II PowerPoint PPT Presentation


  • 135 Views
  • Uploaded on
  • Presentation posted in: General

Matlab Training Session 12: Statistics II. Course Website: http://www.queensu.ca/neurosci/Matlab Training Sessions.htm. Course Outline Term 1 Introduction to Matlab and its Interface Fundamentals (Operators) Fundamentals (Flow) Importing Data Functions and M-Files

Download Presentation

Matlab Training Session 12: Statistics II

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Matlab training session 12 statistics ii

Matlab Training Session 12:Statistics II

  • Course Website:

  • http://www.queensu.ca/neurosci/Matlab Training Sessions.htm


Matlab training session 12 statistics ii

Course Outline

Term 1

  • Introduction to Matlab and its Interface

  • Fundamentals (Operators)

  • Fundamentals (Flow)

  • Importing Data

  • Functions and M-Files

  • Plotting (2D and 3D)

  • Plotting (2D and 3D)

  • Statistical Tools in Matlab

    Term 2

    9. Term 1 review

    10. Loading Binary Data

    11. Nonlinear Curve Fitting

    12. Statistical Tools in Matlab II

    13.

    14.


Matlab training session 12 statistics ii

Week 12 Lecture Outline

Statistics II

  • Basic Matlab Statistics Review

    • Mean, Median, Variance

  • Statistics Toolbox

    • Simple Parametric and Non-parametric statistical tests

  • Simple Statistical Plotting

    • Histograms

    • Box Plots

      D. Anovas

    • 1 Way Unrelated Design

    • Post Hoc vs A Priori Comparisons

    • N-Way Anovas

    • Related (Repeated Measures) Design

    • Unrelated (Between Groups) Design


  • Matlab training session 12 statistics ii

    Week 12 Lecture Outline

    Required Toolboxes:

    Statistics Toolbox


    Matlab training session 12 statistics ii

    Week 12 Lecture Outline

    Statistics II

    Part A: Basic Matlab Statistics Review


    Part a basics

    Part A: Basics

    • The Matlab installation contains basic statistical tools.

    • Including, mean, median, standard deviation, error variance, and correlations

    • More advanced statistics are available from the statistics toolbox and include parametric and non-parametric comparisons, analysis of variance and curve fitting tools


    Mean and median

    Mean and Median

    Mean: Average or mean value of a distribution

    Median: Middle value of a sorted distribution

    M = mean(A), M = median(A)

    M = mean(A,dim), M = median(A,dim)

    M = mean(A), M = median(A): Returns the mean or median value of vector A.

    If A is a multidimensional mean/median returns an array of mean values.

    Example:

    A = [ 0 2 5 7 20]B = [1 2 3

    3 3 6

    4 6 8

    4 7 7];

    mean(A) = 6.8

    mean(B) = 3.0000 4.5000 6.0000 (column-wise mean)

    mean(B,2) = 2.0000 4.0000 6.0000 6.0000 (row-wise mean)


    Mean and median1

    Mean and Median

    Examples:

    A = [ 0 2 5 7 20]B = [1 2 3

    3 3 6

    4 6 8

    4 7 7];

    Mean:

    mean(A) = 6.8

    mean(B) = 3.0 4.5 6.0 (column-wise mean)

    mean(B,2) = 2.0 4.0 6.0 6.0 (row-wise mean)

    Median:

    median(A) = 5

    median(B) = 3.5 4.5 6.5 (column-wise median)

    median(B,2) = 2.0

    3.0

    6.0

    7.0 (row-wise median)


    Standard deviation and variance

    Standard Deviation and Variance

    • Standard deviation is calculated using the std() function

    • std(X) : Calcuate the standard deviation of vector x

    • If x is a matrix, std() will return the standard deviation of each column

    • Variance (defined as the square of the standard deviation) is calculated using the var() function

    • var(X) : Calcuate the variance of vector x

    • If x is a matrix, var() will return the standard deviation of each column


    Standard error of the mean

    Standard Error of the Mean

    • Often the most appropriate measure of error/variance is the standard error of the mean

    • Matlab does not contain a standard error function so it is useful to create your own.

    • The standard error of the mean is defined as the standard deviation divided by the square root of the number of samples


    Matlab training session 12 statistics ii

    Week 12 Lecture Outline

    Statistics II

    Part B: Parametric and Non-parametric statistical tests


    Comparison of means

    Comparison of Means

    • A wide variety of mathametical methods exist for determining whether the means of different groups are statistically different

    • Methods for comparing means can be either parametric (assumes data is normally distributed) or non-parametric (does not assume normal distribution)


    Parametric tests ttest

    Parametric Tests - TTEST

    [H,P] = ttest2(X,Y)

    Determines whether the means from matrices X and Y are statistically different.

    H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same)

    P will return the significance level


    Parametric tests ttest1

    Parametric Tests - TTEST

    [H,P] = ttest2(X,Y)

    Determines whether the means from matrices X and Y are statistically different.

    H return a 0 or 1 indicating accept or reject nul hypothesis (that the means are the same)

    P will return the significance level


    Parametric tests ttest2

    Parametric Tests - TTEST

    Example:

    For the data from Week 8

    exercise 3

    [H,P] = ttest2(var1,var2)

    >> [H,P] = ttest2(var1,var2)

    H =1

    P = 0.00000000000014877

    Variable 1

    Variable 2


    Non parametric tests ranksum

    Non-Parametric Tests Ranksum

    • The wilcoxin ranksum test assesses whether the means of two groups are statistically different from each other.

    • This test is non-parametric and should be used when data is not normally distributed

    • Matlab implements the wilcoxin ranksum test using the ranksum() function

      ranksum(X,Y) statistically compares the means of two data distributions X and Y


    Non parametric tests ranksum1

    Non-Parametric Tests - RankSum

    Example:

    For the data from week 8

    exercise 3

    [P,H] = ranksum(var1,var2)

    P = 1.1431e-014

    H = 1

    Variable 1

    Variable 2


    Matlab training session 12 statistics ii

    Week 12 Lecture Outline

    Statistics II

    Part C: Simple Statistical Plotting

    • Histograms

    • Box Plots


    Histograms

    Histograms

    • Histograms are useful for showing the pattern of the whole data set

    • Allows the shape of the distribution to be easily visualized


    Histograms1

    Histograms

    • Matlab hist(y,m) command will generate a frequency histogram of vector y distributed among m bins

    • Also can use hist(y,x) where x is a vector defining the bin centers

      Example:

    • >>b=sin(2*pi*t)

  • >>hist(b,10);>>hist(b,[-1 -0.75 0 0.25 0.5 0.75 1]);


  • Histograms2

    Histograms

    • The histc function is a bit more powerful and allows bin edges to be defined

      [n, bin] = histc(x, binrange)

      x = statistical distribution

      binrange = the range of bins to plot eg: [1:1:10]

      n = the number of elements in each bin from vector x

      bin = the bin number each element of x belongs

    • Use the bar function to plot the histogram


    Histograms3

    Histograms

    • The histc function is a bit more powerful and allows bin edges to be defined

      Example:

      >> test = round(rand(100,1)*10)

      >> histc(test,[1:1:10])

      >> Bar(test)


    Box plots

    Box Plots

    • Box plots are useful to graphically display the mean and variance of distributions, as well as the interquartile range and outliers


    Box plots1

    Box Plots

    • Matlab function boxplot(x) will generate a boxplot of the distribution defined by x

      Example:

      % add outlier to test distribution

      >>test(101) = 16

      >>boxplot(test)


    Box plots2

    Box Plots

    • The box has lines at the lower quartile, median, and upper quartile values.

    • The whiskers are lines extending from each end of the box to show the extent of the rest of the data.

    • Outliers are data with values beyond the ends of the whiskers.

    • If there is no data outside the whisker, a dot is placed at the bottom whisker.

    +


    Box plots3

    Box Plots

    • boxplot(X,notch) with notch = 1 produces a notched-box plot.

    • Notches graph a robust estimate of the uncertainty about the means for box-to-box comparison. The default, notch = 0, produces a rectangular box plot.

      Example:

      >>test2 = test * (rand*10)

      >>boxplot([test test2],1)


    Matlab training session 12 statistics ii

    Week 12 Lecture Outline

    Statistics II

    D. Anovas

    • 1 Way Unrelated Design

    • Post Hoc vs A Priori Comparisons

    • N-Way Anovas

    • Unrelated (Between Groups) Design

    • Related (Repeated Measures) Design


    Anovas

    Anovas

    • ANOVA’s are tests used to make direct comparisons between the amount by which sample means vary and the amount that values in each sample vary around the group means


    Anovas1

    Anovas

    • ANOVA’s are tests used to make direct comparisons between the amount by which sample means vary and the amount that values in each sample vary around the group means


    Anovas2

    Anovas

    Terminology

    Null Hypothesis = Both Means are the same

    Type I error:

    Reject Null Hypothesis when it is true. Eg Means are not actually significantly when p < 0.05

    Type II error:

    Accept Null Hypothesis when it is false. Eg means are actually significantly different when p > 0.05


    Anovas3

    Anovas

    Beta

    Probability of making type II Error

    Alpha

    Probability of making type I Error

    P < 0.05


    Anovas4

    Anovas

    Terminology

    Family Wise Error:

    The probability of making at least 1 family wise error while making multiple ANOVA comparisons


    1 way anovas

    1 way Anovas

    The matlab function anova1 calculates a 1 way anova

    p = anova1(X) performs a balanced 1-way ANOVA comparing the means of the columns of data in the matrix X

    ** each column must represent an independent sample containing m mutually independent observations.

    The function returns the p-value for the null hypothesis

    p = anova1(X,group)

    group = Each row of group contains the data label for the corresponding column of X


    1 way anovas1

    1 way Anovas

    Assumptions

    All sample populations are normally distributed

    All sample populations have equal variance

    All observations are mutually independent

    The ANOVA test is known to be robust to modest violations of the first two assumptions.


    1 way anovas2

    1 way Anovas

    • The standard ANOVA table divides the variability of the data in X into two parts:

    • Variability due to the differences among the column means (variability between groups)

    • Variability due to the differences between the data in each column and the column mean (variability within groups)


    1 way anovas3

    1 way Anovas

    The ANOVA table has six columns:

    • Source of the variability

    • The Sum of Squares (SS) due to each source.

    • The degrees of freedom (df) associated with each source.

    • Mean Squares (MS) for each source, which is the ratio SS/df.

    • F statistic, which is the ratio of the MS's.

    • The p-value, which is derived from the cdf of F. As F increases, the p-value decreases.


    1 way anovas4

    1 way Anovas

    Example 1

    The following example comes from a study of the material strength of structural beams in Hogg (1987). The vector strength measures the deflection of a beam in thousandths of an inch under 3,000 pounds of force. Stronger beams deflect less. The civil engineer performing the study wanted to determine whether the strength of steel beams was equal to the strength of two more expensive alloys.


    1 way anovas5

    1 way Anovas

    Example 1

    Steel is coded 'st' in the vector alloy. The other materials are coded 'al1' and 'al2'. S

    strength = [82 86 79 83 84 85 86 87 74 82 78 75 76 77 79 ...

    79 77 78 82 79];

    alloy = {'st','st','st','st','st','st','st','st',...

    'al1','al1','al1','al1','al1','al1',...

    'al2','al2','al2','al2','al2','al2'};

    Though alloy is sorted in this example, you do not need to sort the grouping variable.


    1 way anovas6

    1 way Anovas

    Solution:

    p = anova1(strength,alloy)

    p =

    1.5264e-004

    The p-value indicates that the three alloys are significantly different. The box plot confirms this graphically and shows that the steel beams deflect more than the more expensive alloys.


    1 way anovas7

    1 way Anovas


    Post hoc and a priori comparisons

    Post Hoc and A Priori Comparisons

    If a 1 way anova test indicates a significant difference between at least on mean:

    Post Hoc Comparisons: The decision to compare means after a significant 1 way anova is caluculated. When all possible comparisons are made after the fact the changes of type 1 error become high.

    A Priori Comparisons: Comparisons decided upon before the 1 way anova is performed based on the general theory of the study. This minimizes possible type I error.


    N way anovas

    N-way Anovas

    Unrelated (Between Groups) Design

    p = anovan(X,group) performs a balanced or unbalanced mult way ANOVA for comparing the means of the observations in vector X with respect to N different factors.

    • The factors and factor levels of the observations in X are assigned by the cell array group.

    • Each of the N cells in group contains a list of factor levels identifying the observations in X with respect to one of the N factors.

    • The list within each cell can be a vector, character array, or cell array of strings, and must have the same number of elements as X.


    N way anovas1

    N-way Anovas

    Related (Repeated Measures) Design

    NOT IMPLEMENTED IN THE STATISTICS TOOLBOX!!


    Exercise

    Exercise

    • Load testdata2.txt from week 8

    • Assume the data columns represent independent normally distributed variables

    • Perform a 1 way ANOVA on the data and interpret the results


    Getting help

    Getting Help

    • Help and Documentation

    • Digital

    • Accessible Help from the Matlab Start Menu

    • Updated online help from the Matlab Mathworks website:

    • http://www.mathworks.com/access/helpdesk/help/techdoc/matlab.html

    • Matlab command prompt function lookup

    • Built in Demo’s

    • Websites

    • Hard Copy

    • Books, Guides, Reference

    • The Student Edition of Matlab pub. Mathworks Inc.


  • Login