129 Views

Download Presentation
## populations vs. samples

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**populations vs. samples**• we want to describe both samples and populations • the latter is a matter of inference…**“outliers”**• minority cases, so different from the majority that they merit separate consideration • are they errors? • are they indicative of a different pattern? • think about possible outliers with care, but beware of mechanical treatments… • significance of outliers depends on your research interests**summaries of distributions**• graphic vs. numeric • graphic may be better for visualization • numeric are better for statistical/inferential purposes • resistance to outliers is usually an advantage in either case**general characteristics**[“peakedness”] • kurtosis ‘leptokurtic’ ’platykurtic’**right(positive) skew**left(negative) skew • skew (skewness)**central tendency**• measures of central tendency • provide a sense of the value expressed by multiple cases, over all… • mean • median • mode**mean**• center of gravity • evenly partitions the sum of all measurement among all cases; average of all measures**mean – pro and con**• crucial for inferential statistics • mean is not very resistant to outliers • a “trimmed mean” may be better for descriptive purposes**mean**R: mean(x)**trimmed mean**R: mean(x, trim=.1)**median**• 50th percentile… • less useful for inferential purposes • more resistant to effects of outliers…**mode**• the most numerous category • for ratio data, often implies that data have been grouped in some way • can be more or less created by the grouping procedure • for theoretical distributions—simply the location of the peak on the frequency distribution**1.0**1.5 2.0 2.5 modal class = ‘hamlets’ isolated scatters hamlets villages regional centers regional centers**dispersion**• measures of dispersion • summarize degree of clustering of cases, esp. with respect to central tendency… • range • variance • standard deviation**would be better to use midspread…**range R: range(x)**R: var(x)**variance • analogous to average deviation of cases from mean • in fact, based on sum of squared deviations from the mean—“sum-of-squares”**variance**• computational form:**note: units of variance are squared…**• this makes variance hard to interpret • ex.: projectile point sample: mean = 22.6 mm variance = 38 mm2 • what does this mean???**standard deviation**• square root of variance:**standard deviation**• units are in same units as base measurements • ex.: projectile point sample: mean = 22.6 mm standard deviation = 6.2 mm • mean +/- sd (16.4—28.8 mm) • should give at least some intuitive sense of where most of the cases lie, barring major effects of outliers