58 Views

Download Presentation
## Datasets and Variables

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Datasets and Variables**• We want to answer questions • We want to use data for this purpose • Observations of characteristics of cases • Case: person, city, organization, etc. • Characteristic or Variable: age, size, sector of economy, etc. • Dataset: data arranged in case by variable format**Datasets**• Cases • Variables**Variables**• Measures or observations of a case’s • traits • characteristics • qualities • attributes • amounts • quantities • etc.**Errors in variables**• Missing values • Measurement errors • mistakes in • Reporting • Remembering • Recording • Lies, etc. • Quality of answer ~ quality of data**Types of variables**• Categorical: nominal (name) • Ordinal: (name and order) • Measurement: interval and ratio • Interval (name, order, and unit of measure) • Ratio (name, order, unit, and true zero point)**Summarizing Data**• Frequency Distributions • For measurement variables • For categorical variables**Frequency Distribution**• For Categorical variables (table) Variable Value Frequency Proportion (Percent) Dem 17 .425 42.5% Rep 7 .175 17.5 Ind 16 .400 40.0 n=40 1.000 100.0% f f/n f/n * 100**Frequency Distribution**• For Measurement variables (table) Variable Value Frequency Proportion (Percent) 0 19 .44 44 1 10 .23 23 … 7 2 .05 5 n=43 1.00 100**Frequency Distribution**• For Categorical variable (bar chart)**Frequency Distribution**• For Measurement variable (histogram)**Frequency Distribution**• Frequency, f - count the number of cases that have the same value of a variable • Total cases, n - count all the cases • Proportion, p = f/n • Percentage, % = 100 * p**Dataset**• Individual cases • e.g. stats.dta dataset: characteristics of individuals: age, msat, gender • Aggregate data (groups of individual cases) • e.g. college1.dta dataset: characteristics of individuals: age, msat, gender averaged for groupings of students by college**Populations and Samples**• Population: all the relevant cases. The entire set. • Sample: some portion of the population • haphazard (e.g., whomever you meet) • systematic (e.g., every 10th person) • representative (all possibilities included) • random (every case in the population has a fixed probability of being included in the sample, often equal probability)**Populations**• Best information • Expensive or impossible to observe**Samples**• Easier to get • Less expensive • Less accurate - but, can be very accurate depending upon type of sample**Hints for good grade**• Do all assigned exercises and turn them in on time • Do all other exercises for yourself to be sure you understand • Read and study the text, before and after lectures -- several times • Statistics is a language -- learn (memorize) the vocabulary and concepts