Loading in 2 Seconds...
Loading in 2 Seconds...
Introduction to elementary quantitative concepts and methods Guest lecture Carl Henrik Knutsen, 14/5-2008
Motivation • Social sciences, and science in general: We are generally interested in: • “How” questions • “Why” questions. • Social scientists seek descriptions of empirical phenomena and try to come up with causal explanations. Both quantitative and qualitative methodology try to respond to such questions. • Nature of problem question is important for choice of methodology, even if in the real world of social science, researchers often choose method after their knowledge and “taste”. • Knowledge of different methodologies allow researchers and students to fit methodology to problem question Improve analysis. • Triangulation can often be a good idea: Usage of different methodologies to illuminate a problem in a more comprehensive fashion. • The knowledge of elementary quantitative method enables you to read different types of research.
Causality and the control problem • Independent of choice of methodology • Theory and clever design needed • Three causal structures that might lead to correlation: X Y X Y X Y Z
Generalization • The big advantage of quantitative methods • Provides stringent criteria for when we can be relatively certain that our generalizations hold true and are not driven by coincidences. • Remember that in the social sciences, we do not face deterministic relationships between factors. Quant. methods takes into account the stochastic structure of social life.
Data • There exists a vast number of sources for data constructed by different agencies or researchers: You do not need to construct your own data for many purposes. But: Know the data you use in order to avoid different pit-falls. • Sources on the web: World Development Indicators, Penn World Tables, World Governance Indicators, Polity, Freedom House, OECD, UNESCO, UNCTAD etc!
Descriptive statistics • Descriptive vs inferential statistics • Descriptive statistics: Draw out comprehensible information about the structure of your data • 1) Central tendencies, 2) variation, 3) correlation
Central tendency of variable • Mean • Median • Mode
Variation • Range • Variance (S^2 = (Σ(X-M)^2)/(N-1)) • Standard deviation
Correlation • Covariance cov(xy) = (Σ((X-Xm)(Y-Ym)/(N-1) • Correlation coefficients • Pearson’s r = cov(xy)/(S(x)*S(y)): Always between -1 and 1. NB: Gives only degree of linear relationship.
Presentation of data • Tables • Histogram • Bar- and pie-charts • Scatter plots • Important to think about the reader: Combrehensible and informative. Need to strike a balance on the amount of information presented in a chart. Label charts.
Inferential statistics • The aim is solid inference from an observed sample to a larger (unobserved) universe. Generalization about populations or about effects. • For effects: Can we say that trajectories we observe are due to “real” effects or are they likely only a product of chance?
Law of large numbers... • Population, samples, • Estimates and underlying mean. • Random selection? Selection bias ALWAYS a possibility. • Sampling techniques: • Experiment • Random draws • Stratification
Hypothesis test • Democracy and economic growth as example. • H0: Democracy has no effect on growth • Halt: Democracy has an effect on growth • In general H0 is often a hypothesis which claims that there is no effect. We often want to investigate whether we can with relative certainty claim that Halt is valid. • Burden of proof is on the alternative hypothesis. Conservative bias: we have to have relatively strong results to claim a relationship is not due to pure chance. • Central limit theorem as underlying. How do we know the distribution given H0? Use given distribution to find out what one is likely to arrive at by pure chance. The normal distribution.
Central limit theorem • “The central limit theorem is one of the most remarkable results of the theory of probability. In its simplest form, the theorem states that the sum of a large number of independent observations from the same distribution has, under certain general conditions, an approximate normal distribution. Moreover, the approximation steadily improves as the number of observations increases. The theorem is considered the heart of probability theory, although a better name would be normal convergence theorem.” http://davidmlane.com/hyperstat/normal_distribution.html (BerrieZielman)
Significance levels and p-values • Significance level. If we take H0 as true, then we want to have a critical level beyond which it is unlikely that we will see results. For example 5%. Only in 5% probability that we will see this strong relationship if H0 is true. Important to have large sample. • P-value: The lowest significance level that will give rejection of H0. If H0 is true: What is probability that we will see this extreme result.
Models • Stockburger: “A model is a representation containing the essential structure of some object or event in the real world.” • 1. Models are necessarily incomplete • (2. The model may be changed or manipulated with relative ease.)
Regression analysis • How to fit a straight line through a scatterplot! • Best fit: one criteria is to minimize sum of squared residuals Ordinary Least Squares (OLS) • Bivariate regression equation: Y = a + bX + ε • Regression analysis recognizes that the world is not deterministic. The role of the error term: ε. Large error terms in general implies large uncertainty • Interpretation of a: Mean value of Y when X is equal to zero. Often no substantial interpretation. Not so interesting • Interpretation of b: Increase in mean of Y when X increases with one unit. Effect of X on Y?
Assumptions of distribution error term when using OLS: • Homoskedastic • No autocorrelation • Normally distributed
Multivariate regression • Y = a + b1X1 + b2X2 +b3X3 + ε • New interpretation of b: The mean increase in Y when relevant X increases with one unit, given that all other variables are held constant. • R-square: How much of the variation in the data is “explained by the model” (A very imprecise interpretation). Goes from 0 to 1. • “Control variables” • Extensions of regression analysis: Generalized Least Squares, Systems of equations, Instrumental Variables, Logit and Probit models and many more.
Extensions • Dummy variable • Squared X • Logarithmic specifications • Splitting the sample
Problems • 1) “Simultaneity bias”: Reverse causation.Exogeneityvs endogeneity of X-variables. • 2) “Omitted variable bias” • 3) Measurement error. • Reliability. Where does the data come from? GDP in developing countries. • Validity (TFP and technological change) • Operationalization of variable: Have to be observable, quantifiable and measurable.