Loading in 2 Seconds...
Loading in 2 Seconds...
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Software, Hardware, and Database Structure Options for Research in Financial EconomicsSAS and Computing Speed Michael Boldin, WRDS, University of Pennsylvaniaboldinm@wharton.upenn.edu
Main Questions • How can researchers take advantage of modern computing technology ? • Which econometric software packages would you recommend to students ? • How do SAS features and computing speed stack up?
Q1. How can researchers take advantage of modern computing technology ? Observations: • Today’s PCs are better than yesterday’s ‘supercomputer’ (for single users). • The system–hardware, software, and network connections–needs to work as a whole. • Database management (DBMS) matters.
Q2. Which econometric software packages would you recommend to students ? Observations: • Undergrad and Grad advice differs. • Power, flexibility and user-friendly elements are not mutually opposing. • Almost too many choices (and change is hard). • Few students care about good programming practice and they keep bad habits.
Q3. How do SAS features and computing speed stack up? • Is SAS fast enough in raw computing speed? • Does the SAS Data step framework create performance handicaps? • How does SAS/IML stack up to MATLAB and GAUSS in functionality? Other issues: • Does SAS need a better interface to C/C++ and FORTRAN modules ? • What does SAS offer as an RDBM compliment to MATLAB ? • Is greater compatibility with open-source software such as MySQL and PHP possible ?
Statistical Software Evaluations Reviewing the Reviews Noteworthy: Jeffrey MacKie-Mason (1992) ‘Econometric Software: A User’s View’ • Could not select an unqualified winner among: Gauss, Limdep, RATS, SAS, SST, Stata, and TSP. • Preferred TSP. Saw advantages to SAS, but found problems in PC SAS (of 1991). • Correctly predicted movement toward matrix algebra oriented software such as GAUSS. John Rust (1993), ‘GAUSS and MATLAB: A Comparison’ • Highlighted the advantages of matrix oriented programming for econometrics. • Correctly predicted that users would soon be moving away from DOS. • Incorrectly predicted that the move would be toward UNIX workstations. Problems: • Most other reviews just count features. • Or worse, stress speed overall all other issues. • Within 2 years, the review is largely obsolete. • After 5 years likely to be completely misleading if not irrelevant.
Speed Comparisons (by Stefan Steinhaus) Speed Score Overall Score 1997 1999 2002 1997 1999 2002 GAUSS 49.94 47.96 47.90 | 64.38 63.64 64.80 Mathematica 7.67 31.95 31.32 | 48.76 54.93 57.34 Matlab 39.98 34.64 65.89 |60.03 55.85 69.74 Ox 66.21 68.12 62.22 | 47.30 49.22 58.45 O-Matrix 70.80 67.29 69.80 | 48.72 43.68 45.83 S-Plus 37.18 30.51 38.56 | 54.28 44.90 48.61 Source: http://www.scientificweb.com/ncrunch/index.html Higher scores are better. 100 is the highest possible score in each year’s evaluation. Speed scores are not comparable across years. Overall score includes breadth of functionality and other usability considerations, using these weights: Mathematical functions 38%, Graphical functions 10%, Programming environment 9%, Data import/export 5%, Available operating systems 2%, Speed comparison 36 In pure speed comparisons (made comparable across years) -- faster PC and new software vintage makes a poor performer the top performer relative to the ‘best’ old technology pair. And how about SAS ?
A Helicopter View of PC Technology 1981: IBM PC = $5,500 in today’s prices 64K memory, no hard disk, monochrome monitor, no networking capabilities Today: Dell Pentium IV < $1000 1G memory, 3000x faster, 80 Gig hard drive, DVD/CD burner, flat screen color monitor, and built-in networking.
The Speed IssueMoore’s Law in Action Pentium Clockspeed Year MWIPS Time index I 120 Mhz 1995 79 100.0 II 266 Mhz 1997 218 36.2 III 550 Mhz 1999 448 17.6 IV 1.8 Ghz 2001 638 12.4 IV 3.6 Ghz 2003 1342 5.9 IV 3.8 Ghz 2004 3899 2.0 MWIPS = Mean Whetstone Instructions per Second. A higher MWIPS score is better (i.e. faster chip), and a twice as high MWIPS translates to roughly 50% less time to make an average numerical calculation. Source: http://homepage.virgin.net/roy.longbottom/whetstone.htm
Evaluation of Statistical Software Three categories • Traditional programming languages: FORTRAN, C/C++, and Basic. Relatively new: Perl, Python, and Java. • Statistical packages: EVIEWS, SAS, STATA, and TSP. • Matrix algebra oriented computing software: GAUSS, Mathematica, MATLAB, R and Splus.
Speed & User Friendliness Computation Speed: Fortran > C > C++ > Matlab > SAS > Perl User Friendliness: SAS > Matlab > Perl > C++ > C > Fortran Rankings of other languages /packages ?? Java VBasic Stata SPSS SPlus/R Are the speed differences significant ? Are ‘user’ elements only a matter of taste ? How can user friendliness and computation speed be combined in an evaluation.
Computing Speed Only One Part of the Equation Total Research Project Time • Planning • Data Management • Programming • Computation • Analysis of Results • Re-Evaluation (revisit & repeat prior steps)
Simple Model of Cost/Benefit (Time) TradeoffsProgramming = (b0 + b1*x + b2*x2) / (ease-factor)Computation = (a0j + a1*x + a2* x2) / (speed) Both programming and computing time depend on the complexity of the task, and the computing speed advantage of Package 2 may overwhelm the ease of use issue for modestly complex tasks. Package 2 preferred for complexity level above 6
Simple Model of Cost/Benefit (Time) TradeoffsProgramming = (b0 + b1*x + b2*x2) / (ease-factor*2)Computation = (a0j + a1*x + a2* x2) / (speed*10) Increase in computing speed (relative to ease-factor) makes Package 1 a better choice for a larger range of tasks. Threshold for preferring Package 2 rises
Black-Scholes Calculation Speeds *SAS code -- Black Scholes Option Value calculation; * S= Spot price, X = Excise price, sigma= Stock return volatility * r= Risk free bond rate, q= Dividend rate, tau= Time till maturity; d1= ( log(S/X) + ( r – q + 0.5*sigma*sigma ) * tau ) / ( sigma*sqrt(tau) ); d2= d1 - sigma * sqrt(tau); *Normal curve cumulative density function values; N1= cdf('normal',d1); N2= cdf('normal',d2); Vc = ( S * exp(-q*tau) * N1 ) - ( X * exp(-r*tau)* N2 );
SAS vs. MATLAB Computation Speed Comparison Basic Statistics Example Simulated Data: 1million observation, 10 variables, in 10 groups SAS MATLAB Data creation 3.6 1.4 Mean & std 1.6 1.4 Frequency 0.3 0.3 REG module 0.8 2.2 Sort by group 8.4 2.4 REG by group 1.11.4 sum 15.8 seconds 9.1 seconds Bottom line: • MATLAB is almost twice as fast in relative difference (42% faster in this example), but only 6.7 seconds faster in absolute difference. • For most applications there are less than 1 million observations and the absolute difference is even smaller.
SAS vs. MATLABComputation Speed Comparison Is MATLAB’s speed advantage due to its matrix based programming ? No. SAS also has a Interactive Matrix Language module (IML). Using SAS IML shows how alternative programming methods can matter (within the same package). OLS Regression Example: 1million observation, 10 variables SAS IML MATLAB B= inv(X’X)*(X’y) 2.6 0.4 REG module 0.8 2.2 Programming the OLS matrix algebra equation in MATLAB beats MATLAB’s regress(.) function in terms of speed, while the opposite is true for SAS.
Finance Research Example CAPM (Beta) Test: Ri,t = αi + βi Rmt Multi-user UNIX system run time varies depending on load. MATLAB run time varies depending on program design– optimal vectorized code versus an inefficient loop. A true CAPM test would estimate multi-factor betas (βi) for 5,000 to 25,000 stocks over different sample periods. Summarizations require sorting into portfolios and applying 2 stage estimation and testing techniques. Example: SAS run = 40 minutes // MATLAB = 35 minutes
Conclusions: Changes in technology change the equation for determining the best system—personal preferences are important. Absolute speed (not relative speed) may matter but programming time is overwhelmingly the larger component (in > 90% of the cases) anyway. Software is not an either/or situation. Advice: Learn and use two or more software packages as compliments. Database management and connectivity is the key to the greatest possible flexibility.
Almost Counterintuitive General Conclusion: Technological progress makes human factors and personal preferences most important.