Market Structure, Trading, and LiquidityFIN 2340 Dr. Michael Pagano, CFA Econometric Topics Adapted and Excerpted from Slides by: Dr. Ian W. Marsh Cass College, Cambridge U. and CEPR
Overview of Key EconometricTopics • Two-Variable Regression: Estimation & Hypothesis Testing • Extensions of the Two-Variable Model: Functional Form • Estimating Multivariate Regressions • Multivariate Regression Inference Tests & Dummy Variables
Introduction • Introduction to Financial Data and Financial Econometrics • Ordinary Least Squares Regression Analysis - What is OLS? • Ordinary Least Squares Regression Analysis - Testing Hypotheses • Ordinary Least Squares Regression Analysis - Diagnostic Testing
Econometrics • Literally means “measurement in economics” • More practically it means “the application of statistical techniques to problems in economics” • In this course we focus on problems in financial economics • Usually, we will be trying to explain the behavior of a financial variable
Financial Data • What sorts of financial variables do we usually want to explain? • Prices - stock prices, stock indices, exchange rates • Returns - stock returns, index returns, interest rates • Volatility • Trading volumes • Corporate finance variables • Debt issuance, use of hedging instruments
Time Series Data • Time-series data are data arranged chronologically, usually at regular intervals • Examples of Problems that Could be Tackled Using a Time Series Regression • How the value of a country’s stock index has varied with that country’s macroeconomic fundamentals. • How a company’s stock returns has varied when it announced the value of its dividend payment. • The effect on a country’s currency of an increase in its interest rate
Cross Sectional Data • Cross-sectional data are data on one or more variables collected at a single point in time • e.g. A sample of bond credit ratings for UK banks • Examples of Problems that Could be Tackled Using a Cross-Sectional Regression • The relationship between company size and the return to investing in its shares • The relationship between a country’s GDP level and the probability that the government will default on its sovereign debt.
Panel Data • Panel Data has the dimensions of both time series and cross-sections • e.g. the daily prices of a number of blue chip stocks over two years. • It is common to denote each observation by the letter t and the total number of observations by T for time series data, • and to to denote each observation by the letter i and the total number of observations by N for cross-sectional data.
Econometrics versus Financial Econometrics • Little difference between econometrics and financial econometrics beyond emphasis • Data samples • Economics-based econometrics often suffers from paucity of data • Financial economics often suffers from infoglut and signal to noise problems even in short data samples • Time scales • Economic data releases often regular calendar events • Financial data are likely to be real-time or tick-by-tick
Economic Data versus Financial Data • Financial data have some defining characteristics that shape the econometric approaches that can be applied • outliers • trends • mean-reversion • volatility clustering
All pieces of empirical work should begin with some basic data analysis Eyeball the data Summarize the properties of the data series Examine the relationship between data series Most powerful analytic tools are your eyes and your common sense Computers still suffer from “Garbage in - garbage out” Basic Data Analysis
Eyeballing the data helps establish presence of trends versus mean reversion volatility clusters key observations outliers data errors? turning points regime changes Basic Data Analysis
Summary statistics Average level of variable Mean, median, mode Variability around this central tendency Standard deviations, variances, maxima/minima Distribution of data Skewness, kurtosis Number of observations, number of missing observations Basic Data Analysis
Since we are usually concerned with explaining one variable using another “trading volume depends positively on volatility” relationships between variables are important cross-plots, multiple time-series plots correlations (covariances) multi-collinearity Basic Data Analysis
Taking natural logarithms Calculating returns Seasonally adjusting De-meaning De-trending Lagging and leading Basic Data Manipulations
The basic story • y is a function of x • y depends on x • y is determined by x “the spot exchange rate depends on relative price levels and interest rates…”
Terminology • y is the x’s are the • predictand predictors • regressand regressors • explained variable explanatory variables • dependent variableindependent variables • endogenous variable exogenous variables • left hand side variableright hand side variables
Data • Suppose we have n observations on y and x: cross section yi = α + β xi + ui i = 1, 2, …, n time series yt = α + β xt + ut t = 1, 2, …, n
Errors • Where does the error come from? • Randomness of (human) nature • men and markets are not machines • Omitted variables • men and markets are more complex than the models we use to describe them. Everything else is captured by the error term • Measurement error in y • unlikely in financial applications
Objectives • to get good point estimates of α and β given the data • to understand how confident we should be in those estimates • both will allow us to make statistical inferences on the true form of the relationship between y and x (“test the theory”)
Simple Regression: An Example • We have the following data on the excess returns on a fund manager’s portfolio (“fund XXX”) together with the excess returns on a market index: • We want to find whether there is a relationship between x and y given the data that we have. The first stage would be to form a scatter plot of the two variables.
Finding the Line of Best Fit • We can use the general equation for a straight line, y = α + βx to get the line that best “fits” the data. • But this equation (y = α + βx) is completely deterministic. • Is this realistic? No. So what we do is to add a random disturbance term, u into the equation. yt = +xt+ ut where t = 1, 2, 3, 4, 5
Determining the Regression Coefficients • So how do we determine what and are? • Choose andso that the distances from the data points to the fitted lines are minimised (so that the line fits the data as closely as possible) • The most common method used to fit a line to the data is known as OLS (ordinary least squares).
Ordinary Least Squares • What we actually do is 1. take each vertical distance between the data point and the fitted line 2. square it and 3. minimize the total sum of the squares (hence least squares).
Algebra Alert!!!!! • Tightening up the notation, let • yt denote the actual data point t • denote the fitted value from the regression line • denote the residual, yt -
How OLS Works • So min. , or minimise . This is known as the residual sum of squares. • But what was ? It was the difference between the actual point and the line, yt - . • So minimizing is equivalent to minimizing with respect toand .
What do we Use and For? • In the CAPM example used above, optimising would lead to the estimates • = -1.74 and • = 1.64. • We would write the fitted line as:
What do we Use and For? • If an analyst tells you that she expects the market to yield a return 20% higher than the risk-free rate next year, what would you expect the return on fund XXX to be? • Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”, so plug x = 20 into the equation to get the expected value for y:
Is Using OLS a Good Idea? • Yes, since given some assumptions (see later) least squares is BLUE • best, linear, unbiased estimator • OLS is consistent • as sample size increases, estimated coefficients tend towards true values • OLS is unbiased • Even in small samples, estimated coefficients are on average equal to true values
Is Using OLS a Good Idea? (cont.) • OLS is efficient • no other linear estimator has a smaller variance around the estimated coefficient values • some non-linear estimators may be more efficient
Testing Hypotheses • Once you have regression estimates (assuming the regression is a “good” one) you take the results to the theory: “Theory says that the intercept should be zero” “Theory says that the coefficient on prices should be unity” “Theory says that the coefficient on domestic money should be unity and the coefficient on foreign money should be minus unity”
Testing Hypotheses (cont.) • Testing these statements is called hypothesis testing • This involves comparing the estimated coefficients with what theory suggests • In order to say whether the estimates are “too far” from theory we need some measure of the precision of the estimated coefficients
Standard Errors • Based on a sample of data, you have estimated the coefficients and • How much are these estimates likely to alter if different samples are chosen? • The usual measure of this degree of uncertainty is the standard error of the coefficient estimates
Standard Errors (cont.) • Algebraically, given some crucial assumptions, standard errors can be computed as follows:
Error Variance • σ2 is the variance of the error or disturbance term, u • this is unobservable • we approximate it with the variance of the residual terms, s2
Standard Errors • SE are smaller as • T increases, • more data makes precision of estimated coefficients higher • the variance of x increases, • more dispersion of dependent variable about its mean, makes estimated coefficients more precise • s decreases • better the fit of the regression (smaller residuals), the more precise are estimates
So now you have the coefficient estimates and the associated standard errors. You now want to test the theory. Five-Step Process: Step 1: Draw up the null hypothesis (H0) Step 2: Draw up the alternative hypothesis (H1 or HA) Null and Alternative Hypotheses
Usually, the null hypothesis is what theory suggests: e.g. testing the ability of fund mangers to outperform the index EMH suggests αj = 0, so, H0: αj = 0 (fund managers earn zero risk adjusted excess returns) Null Hypothesis
The alternative is more tricky Usually the alternative is just that the null is wrong: H1: α 0 (fund managers earn non-zero risk adjusted excess returns; fund managers underperform or out-perform) But sometimes is more specific H1: α < 0 (fund managers underperform) Alternative Hypothesis
Confidence Intervals • Suppose our point estimate for α is 0.058 for fund XXXX and the associated standard error is 0.025 based on 20 observations • Has fund XXXX outperformed? • Can we be confident that the true α is different to zero? Step 3: Choose your level of confidence Step 4: Calculate confidence interval
Confidence Interval (cont.) • Convention is to use 95% confidence levels • Confidence interval is then • tcritical is appropriate percentile (eg 97.5th) of the t-distribution with T-2 degrees of freedom • 97.5th percentile since two-sided test • 2 degrees of freedom were lost in estimating 2 coefficients