1 / 16

Analyzing data

Analyzing data. Science: “It is a systematic activity of acquiring knowledge about the world and universe and condensing that knowledge into testable laws and theories". Observations. Provide a data summary Help discover trends and patterns .

imaran
Download Presentation

Analyzing data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing data

  2. Science: “It is a systematic activity of acquiring knowledge about the world and universe and condensing that knowledge into testable laws and theories". Observations

  3. Provide a data summary Help discover trends and patterns. Statistics -The branch of mathematics that deals with the collection, organization, analysis, and interpretation of numerical data. Common uses of statistics 1. Description 2. Design • Assist in the design of experiments and field studies • A priori decisions about usefulness of experiments. 3. Test hypothesis • Test hypothesis to see whether observed patterns are consistent with predictions.

  4. Sample S Concept of population, samples and testing of hypothesisWe wish to study the students at Texas A&M University in respect of gender, height, weight, IQ and ethnic representation. Population –A population is a collection of all the elements we are studying and about which we are trying to draw conclusions. e.g. All the students at Texas A&M – over 50,000Sample -A sub-set of the population - A random sample of 500 students. Population We take note of the gender, height, weight, IQ and ethnic background of randomly selected 500 students and based on that we can draw the conclusion about the entire students population. 55,000 students At Texas A&M University 500 students

  5. Testing of hypothesis • We wish to study the students at Texas A&M University in respect of gender, height, weight, IQ and ethnic representation. • Population – All the students at Texas A&M – over 50,000 • Sample -A sub-set of the population - A random sample of 500 students. • We take note of the gender, height, weight, IQ and ethnic background of randomly • selected 500 students and based on that we can draw the conclusion about the entire • students population. • From the these data, we can make some charts and interesting comparisons. • 1. all traits in female vs male students - only two means for each trait • 2. all traits in different ethnic groups – Asian, Hespanics, White Americans, • African Americans, American Indians - five means for each traits • 2. We conduct a field experiment to compare the yield performance of all the sorghum varieties at Texas A&M. The comparisons can be made : • 1. Yield performance of land races vs. improved varieties – only two means • 2. Yield performance of varieties from Asia, Africa, South America, Europe • and North America. – five means • The working hypothesis H0 = no difference between different groups • We use appropriate statistical tests to accept or reject the hypothesis.

  6. Data processing and analysis • Data can be grouped in: • a). discrete classes, e.g. number of male and female students and • number of students in different ethnic classes • b). Continuous classes, e.g. height, weight, I.Q. • Data can be presented by graphical forms. • e.g. histogram, pie chart, scatter graphs • and also analyzed by using appropriate statistics: • such as chi-square test, t-test and F-test.

  7. Testing of hypothesis • Chi-square test – for use with qualitative or numerical data with discrete distribution. • t-test– for comparing means between two groups for continuous distribution. • F-test (ANOVA) – for comparing means for multiple groups for continuous data.

  8. The chi-square statistic For example: i). are the numbers of male and female students as expected based on the male and female ratio in the state? ii). Is there an association between ethnic group and height? or weight? or IQ? iii). Are the observed genetic ratios as expected?

  9. An Example • Assuming that the random sample of 500 students had 300 males and 200 females. • Observed: 300 :200 • Expected: 250 : 250 based on the equal population of males and females in the state The probability chi-square values have been compiled in a tabular form to test the level of significance. Let us plug the values into the formula: 2 = (300 - 250)2 / 250 + (200 - 250)2 / 250 = (50)2 / 250 + (-50)2 / 250 = 2500 / 250 + 2500 / 250 = 10 + 10 = 20 • This is the chi-square value for one degrees of freedom • Comparing with the table value of chi-square is 6.6 for significance at 1% level. Therefore, the number of female students is significantly less than expected.

  10. Chi-Square Table

  11. Characteristics of a normal population The variance is a measure of the dispersion of data from the mean. It is measured as the mean of the squared differences between individual data points and the mean of the array. The Mean The standard deviation The standard deviation is the square root of the variance. Probability The standard error of mean .68 The coefficient of variation .95 .99 µ-3 µ-2 µ- µ µ+ µ+2 µ+3

  12. Logic of t-test M1 - M2 SE diff M1 – M 2 = Difference between two means SE diff = standard error of the difference between means • Larger the difference between means and smaller the SE diff, larger the t value.

  13. Thus, for the t-test, we estimate the value of t as ratio of the difference between the two means and the standard error of the mean difference difference between group means t = variability of groups _ _ XT - XC _ _ t = = d/sd SE(XT - XC) = t-value

  14. Critical Values of the t-Distribution

  15. Example:Seed yield /plant of three cowpea varieties Plant Var 1 Var 2 1 10 8 2 9 8 3 10 7 4 8 9 5 9 6 6 8 4 7 9 5 8 7 5 9 7 6 10 6 4 ________________________________________ Variety means: 8.3 6.2 Is there a significant difference in the yield potential of two varieties? This can be ascertained by t-test

  16. T-test for seed yield /plant between three cowpea varieties Plant Var 1 Var 2 1 10 8 2 9 8 3 10 7 4 8 9 5 9 6 6 8 4 7 9 5 8 7 5 9 7 6 10 6 4 _____________________________________________________ Variety means: 8.3 6.2 Variety SS 16.1 27.6 Variety variances 1.78 3.06 Standard dev. 1.33 1.75 t - test between Var1 and Var2, t = 8.3-6.2/(1.33/3.16 + 1.75/3.16) = 2.16 *( for 9 df, p = .025 -.05) This indicates that the varieties differ significantly for yield potential.

More Related