310 likes | 319 Views
Question. What are data and what do they mean to a scientist?. Dinner at the Urquhart House. Brought to you by the Briggs Multiracial Alliance Sunday night All food provided (probably Chinese) Contact Mimi Reddy, reddydee@msu.edu for details. Data, Statistics, and Spreadsheets.
E N D
Question • What are data and what do they mean to a scientist?
Dinner at the Urquhart House • Brought to you by the Briggs Multiracial Alliance • Sunday night • All food provided (probably Chinese) • Contact Mimi Reddy, reddydee@msu.edu for details
Data, Statistics, and Spreadsheets • What are data? • What are statistics? • What are spreadsheets? • How can you analyze data with spreadsheets?
Data • Data are pieces of information • Data can be numbers, words, descriptions • Data have UNITS • The word data is PLURAL, datum is singular • Data about Willoughby: • Age: 5 (years) • Height: 47 (inches) • Weight: 66 (pounds) • Eyes: Blue • Favorite word: Wrestle • Favorite letter: W
Types of Data • Numbers – two types • Real #s – rational numbers – 28.75 lbs • Integers – whole numbers – 18 months • Letters – called characters in programming • W is a character • Words – called strings in programming • “No thanks” is a strings, can be individual words or phrases
Test Scores: Jeff: 88 Mollie: 92 Marcie: 88 Dave: 47 Karim: 99 Willoughby: 42 Benjamin: 0 What statistics can you calculate to describe these data? Try to think of four things to describe the data stop Statistics and Data
Statistics • Statistics are derived from the data • Statistics are descriptions of data • Statistics are meant to simplify the data • Statistics can be misleading
Typical Statistics • Sample Size - number of individuals measured = n • Sum = S • Average or Mean = S/n • Median • Value of 50th percentile, half of values fall above, half below • Maximum, Minimum, Range (Max-Min) • Mode - most common value • Standard deviation • Variance (SD2)
Mean, max, min, range, median, mode 18 33 4 47 49 38 29 4 55 sample size (n) Sum S mean=average=S/n denoted x median = halfway mode = most common Analyze these data...
Spreadsheets • Spreadsheets are tables • Spreadsheets allow calculations and manipulations of data • Calculations: mean, standard deviation • Manipulations: sort,
Make a data table: • Fly 1, length 13.4 mm, velocity 27 Kph, age 21 days • Fly 2, length 9.4 mm, velocity 0 Kph, age 220 days • Fly 3, length 9.3 mm, velocity 44 Kph, age 1 days • Fly 4, length 13.4 mm, velocity 17 Kph, age 32 days • Fly 5, length 17.4 mm, velocity 33 Kph, age 11 days • How many columns? • How many rows? • #s go down or across?
Microsoft Excel • Typical spreadsheet program • Lotus 1-2-3 is original commercial spreadsheet • Has similar controls to MS Word • Now allows graphing (charts) • very restricted formats, hard to get exactly what you want • Excel tables and graphs can be copied into MS Word
Friday’s Assignment • We will work with Microsoft Excel to analyze some data • Groups of two will submit one finished spreadsheet for the assignment
Graphs • Many different types of graphs • Points • Lines • Bars • Pies
Point Graphs • Called X-Y Scatter in MS Excel • Plot points based on X and Y value • Can fit a “REGRESSION LINE” to the data • Line that best fits the data
Bar Graphs • Categorize data into counts or percents • Categories can be descriptive categories (Windows 98, Windows 2000, …) • Can also be numeric categories • Height: 60-63, 63-66, etc. or just 61, 62, 63… • Count up number of people in each group • Histograms are a particular type of bar graph
Histogram • X axis is categories • Y axis is a number or proportion of observations in that category
Histogram Bar Graph Number of Crashes
Distributions • Special type of histogram with continuous numeric scale at bottom • Normal distribution is a key concept in statistics • Skewed distribution is one that is unbalanced
Sample distribution histograms Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt Robert D. Duval, PS 400 Lecture, www.polsci.wvu.edu/duval/ps400/Notes/400Notes.ppt
The NORMAL Distribution • A NORMAL DISTRIBUTION is the theoretical distribution of values given natural variation around a MEAN • It is balanced, humped distribution
Distributions • Skew is an imbalance in the distribution Danyoungyoo, Katanchalee, and Srichawla, www.s-t.au.ac.th/handout/st2204/week5-Univariate-Des.ppt
Hypothesis Testing • Statistical Tests are how scientists decide if data support their hypothesis • (NOT PROVE their hypothesis) • Four major statistical tests: T-test, X2 Test, Regression, ANOVA
Hypothesis • Processor speed has an effect on the performance of the computer. • Null Hypothesis • H0: Processor speed has NO EFFECT on the performance of a computer.
Statistical Tests and Probability • Statistical tests give a value • That value can be related to a probability • Probability is likelihood that NULL hypothesis is correct given the data you have • If P < 0.05 (1/20), then you conclude NULL hypothesis is FALSE
T-Test • Compares differences between two means • Formula: T = (x1-x2)/SEM • SEM is Standard Error of Mean [SD/(N-1)] • T Values: Difference between mean in comparison to the amount of spread in your data
T-Values • If T > 2.5 or 3.0, difference is usually significant (this depends on your sample sizes)