1 / 15

# - PowerPoint PPT Presentation

Statistics Overview. Some New, Some Old… Some to come. Science of Statistics. Descriptive Statistics – methods of summarizing or describing a set of data tables, graphs, numerical summaries

Related searches for

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '' - thao

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Statistics Overview

Some New, Some Old… Some to come

• Descriptive Statistics – methods of summarizing or describing a set of data tables, graphs, numerical summaries

• Inferential Statistics – methods of making inference about a population based on the information in a sample

• Nominal: The numerical values just "name" the attribute uniquely; no ordering of the cases is implied.

• Ordinal: Attributes can be rank-ordered; here, distances between attributes do not have any meaning.

• Interval: The distance between attributes does have meaning.

• Ratio: There is always an absolute zero that is meaningful; this means that you can construct a meaningful ratio.

• It's important to recognize that there is a hierarchy implied in the level of measurement idea. At each level up the hierarchy, the current level includes all of the qualities of the one below it and adds something new. In general, it is desirable to have a higher level of measurement.

• Individuals are the objects described by a set of data; may be people, animals or things

• Variable is any characteristic of an individual

• Categorical variable places an individual into one of several groups or categories

• Quantitative variable takes numerical values for which arithmetic operations make sense

• Distribution of a variable tells us what values it takes and how often it takes these values

• Correlation can be used to summarize the amount of linear association between two continuous variablesx and y.

• A positive association between the x and y variables is indicated by an increase in x accompanied by an increase in y.

• A negative association is indicated by an increase in x accompanied by a decrease in y.

• A chi square statistic is used to investigate whether distributions of categorical variables differ from one another.

• The chi square distribution, like the t distributions, form a family described by a single parameter, degrees of freedom.

• df = (r – 1) X (c – 1)

• For a detailed example, see http://math.hws.edu/javamath/ryan/ChiSquare.html

• Hypothesis testing in science is a lot like the criminal court system in the United States… consider – How do we decide guilt?

• Assume innocence until ``proven'' guilty.

• Proof has to be ``beyond a reasonable doubt.''

• Two possible decisions: guilty or not guilty

• Jury cannot declare someone innocent

• Statistical Hypotheses are statements about population parameters.

• Hypotheses are not necessarily true.

• The hypothesis that we want to prove is called the alternative hypothesis, Ha.

• Hypothesis formed which contradicts Ha is called the null hypothesis, Ho.

• After taking the sample, we must either: Reject Ho and believe Ha or Fail to Reject Ho because there was not sufficient evidence to reject it.

• Consider the jury trial…

• If a person is really innocent, but the jury decides (s)he's guilty, then they've sent an innocent person to jail.

• Type I error.

• If a person is really guilty, but the jury finds him/her not guilty, a criminal is walking free on the streets.

• Type II error.

• In our criminal court system, a Type I error is considered more important than a Type II error, so we protect against a Type I error to the detriment of a Type II error. This is ‘typically’ the same in statistics.

• The choice of alphais subjective.

• The smaller alphais, the smaller the critical region. Thus, the harder it is to Reject Ho.

• The p-value of a hypothesis test is the smallest value of alpha such that Ho would have been rejected.

• If P-value is less than or equal to alpha, reject Ho.

• If P-value is greater than alpha, do not reject Ho.

• Statisticians prefer interval estimates.

• Point Estimate +/- Critical Value * Standard Error

• The degree of certainty that we are correct is known as the level of confidence.

• Common levels are 90%, 95%, and 99%.

• Increasing the level of confidence,

• Decreases the probability of error

• increases the critical point

• widens the interval

• Increasing n, decreases the width of the interval

• This is a statistics utilized in cross-tabulation tables.

• Typically viewed as a nonparametric statistic.

• The Gamma statistic is preferable to Spearman R or Kendall tau when the data contain many tied observations. Gamma is a probability; specifically, it is computed as the difference between the probability that the rank ordering of the two variables agree minus the probability that they disagree, divided by 1 minus the probability of ties.

• It is basically equivalent to Kendall tau, except that ties are explicitly taken into account.

• Detailed discussions of the Gamma statistic can be found in Goodman and Kruskal (1954, 1959, 1963, 1972), Siegel (1956), and Siegel and Castellan (1988).

• This statistic also tells us about the strength of a relationship.

• Can be used with ordinal or higher level of data.

• For a more detailed discussion of Lambda, Gamma and Tau, see http://72.14.209.104/search?q=cache:8ZS4_FvVqrgJ:ms.cc.sunysb.edu/~mlebo/_private/Classes/POL501/Lecture%252012.pdf+gamma+AND+lambda+AND+tau+AND+statistics&hl=en&gl=us&ct=clnk&cd=39

• A sample is expected to mirror the population from which it comes, however, there is no guarantee that any sample will be precisely representative of the population from which it comes. The difference between the sample and the population is referred to as bias.

• Sampling BiasA tendency to favor selecting people that have a particular characteristic or set of characteristics. Sampling bias is usually the result of a poor sampling plan. The most notable is the bias of non response when people of specific characteristics have no chance of appearing in the sample.

• Non-Sampling ErrorIn surveys of personal characteristics, unintended errors may result from:

• The manner in which the response is elicted

• The social desirability of the persons surveyed

• The purpose of the study

• The personal biases of the interviewer or survey writer