Download Presentation
## AGEC 784

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Data Analysis for Modeling**AGEC 784**Data Analysis in the Context of Modeling**• Supports the modeling process • Improves accuracy of model • Improves usefulness of conclusions • Modeling is the primary goal. • Data analysis is a means to that goal.**Topics for Chapter**• Finding facts in databases • Editing, searching, sorting, filtering, and tabulating • Sampling • Estimating parameters • Point estimates and interval estimates**Finding Facts from Databases**• Tables of information • Each row is a record in the database. • Each column is a field for the records. • Excel calls such a table a list.**Excel Lists**• First row contains names for each field • Each successive row contains one record. • Lists may be: • Searched and edited • Sorted • Filtered • Tabulated**Searching and Editing Lists**• First assign a range name to entire list. • Include column titles. • With list selected Form on the Quick Access Toolbar. • Examine records one at a time: • Find Prev. • Find Next. • Enter new record with New button. • Delete record with Delete button.**Criteria Button**• Selected Form on the Quick Access Toolbar • Allows for searching of records • Enter data into a field. • Click Find Next.**Alternate Excel Search Techniques**• Highlight entire database. • Home"Editing►Find&Select►Find. • Use Find and Replace to edit entries. • In Find and Replace • “?” stands for any single symbol • “*” stands for any sequence of symbols**Filtering**• Choose Home►Editing►Sort&Filter►Filter. • Will filter lists based on values • Found under arrow at the title of each column • Arrow on title turns blue to remind list is filtered • Can remove filter by: • Choose Home►Editing►Sort & Filter►Filter►Clear**More Filtering**• Top 10 option returns records with smallest or largest value of a numerical record • Custom option allows filtering with compound criteria • More complicated compound criteria can be achieved by clicking Custom Filter option under Number Filters or Text Filters.**Tabulating**• Select Data – Pivot Table. • Creates summary tables • Layout button on third step of wizard creates the format for the table**Analyzing Sample Data**• Data is unlikely to cover whole population • Work with sample from population • Statistics are summary measures about sample • Want to construct statistics that represent population • Convenience sampling • Have easy access to information on subset of population • Subset may not be representative • Random sampling • All objects in population have equal chance of appearing in sample**Descriptive Statistics**• Summarizes information in sample • Gives numerical picture of observations • Excel Tools – Data Analysis • Descriptive Statistics table produced based on data given as input**Inferential Statistics**• Use information in sample to make inferences about population • Systematic Error • If sample not representative of population • Avoid by careful sampling • Sampling Error • Sample is merely subset of population • Mitigated by taking large samples**Estimating Parameters: Point Estimates**• The sample average is calculated as: • The sample variance is calculated as: • and its square root is the sample standard deviation:**(Optional) Estimating Parameters: Interval Estimates**• We can estimate parameters in two ways, with point estimates and with interval estimates. • The interval estimate approach produces a range of values in which we are fairly sure that the parameter lies, in addition to a single-value point estimate. • A range of values for a parameter allows us to perform sensitivity analysis in a systematic fashion, and it provides input for tornado charts or sensitivity tables.**Interval Estimates for the Mean**• P(L <= m <= U) = 1 – a. • L and U represent the lower and upper limits of the interval. • 1 – a represents the confidence level. • Usually a large percentage like 95 or 99% • m represents the (unknown) true value of the parameter.**Sampling Theory**• Working with a population described by a Normal probability model • Mean m and standard deviation s. • Take repeated samples of n items from population • Calculate the sample average each time • The sample averages will follow a Normal distribution with a mean of m and a variance of s2/n.**Estimates**• Standard error: the standard deviation of some function being used to provide an estimate. • Use the sample average to estimate the population mean. • The standard deviation of the sample average is called the standard error of the mean:**Z-scores**• The z-score measures the number of standard deviations away from the mean. • The z-score corresponding to any particular sample average is: • Tells how many standard errors from the mean • 90% of the sample averages will have z-scores between –1.64 and +1.64. • The chances are 90% that the sample average will fall no more than 1.64 standard errors from the true mean.**Confidence Intervals for Means**• Upper and lower limits on estimate for mean: • n>30 recommended unless original population resembles Normal • z can be computed using NORMSINV(1-a/2) • Replace s by the sample standard deviation s • Provided that sample is larger than n = 30 • Excel Descriptive Statistics also will calculate half-width of confidence interval**Interval Estimates for a Proportion**• To estimate the sample proportion p, the interval estimate is: • Sample size should be at least 50 for this formula to be reliable**Sample Size Determination**• Suppose we want to estimate mean of sample to within a range of ±R n = (zs / R)2 • Assumes: • Sampling from Normal distribution • Known variance – can begin with small sample to estimate standard deviation**Sample Size Determination for Proportions**• Suppose want to estimate a proportion to within a range of ±R n = z2p(1 – p) / R2 • Value maximized at p = 0.5 • Conservative value: n = (z/2)2/ R2**Summary**• Data collection and analysis support the modeling task where appropriate. • When early sensitivity testing indicates that certain parameters must be estimated precisely, we turn to data analysis for locating relevant information and for estimating model parameters. • The process of finding facts in data is aided by a facility with Excel and in particular with its database capabilities. • Excel provides an array of commands for searching, sorting, filtering, and tabulating data. • Excel’s Data Analysis tool for calculating descriptive statistics enables rapid construction of point estimates and interval estimates from raw data.