270 likes | 636 Views
Grid Computing at Texas Tech University using SAS. Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business Intelligence Texas Tech University. What is Grid Computing?.
E N D
Grid Computing at Texas Tech University using SAS Ron Bremer Jerry Perez Phil Smith Peter Westfall* Director, Center for Advanced Analytics and Business Intelligence Texas Tech University
What is Grid Computing? • Grid computing means using multiple resources connected by the net to perform demanding calculations. • Example:
Economies of High Performance Computing • Current fastest machine: ~40 Teraflops ($300M) • 10 Tflops Machines (~$50M) • Fastest Cluster at TTU: 0.1 Tflops (~$0.1M) • Speed of a PC 0.003 Tflops (~$.001M)
Underused Resources • Computers are everywhere, mostly idle! • Grid computing leverages unused resources to create an effective “Supercomputer” • Teraflops = (N computers) x (TFLPs per) • For Free! (Almost)
Grid Initiatives at TTU and in Texas • HipCAT – High Performance Computing Across Texas • TIGRE – Texas Internet Grid for Research and Education • SORCER – Service ORienter Computing EviRonment (TTU CS dept.) • SAS/Connect grid
HipCAT • Consortium of Texas institutions working together to use • High performance computing • Clusters • Massive data storage • Scientific visualization • Grid computing. • Director: Phil Smith, Texas Tech University • Members: • Baylor College of Medicine • Rice University • Texas A&M University • Texas Tech University • University of Houston • University of Texas • University of Texas at Austin • University of Texas at Arlington • University of Texas at El Paso • University of Texas Southwestern Medical Center
TIGRE • Texas Internet Grid for Research & Education • Two year project involving: UT, TTU, UH, Rice, and TAMU • Funding announced by the Governor in September • TIGRE will develop a grid software stack and policies and procedures to facilitate Texas grid computing efforts.
Grid Software Products Used at TTU • AVAKI • Globus • Jini Networking Technology • SAS/Connect (MPConnect), %Distribute macro
Benefits of SAS • Ease of Use (relative to other grid products) • Available and applicable for many scientists in their resp. fields • Flexibility • Data base (DATA step, PROC SQL) • Math/Optimization (SAS/IML, SAS/OR) • Stat (SAS/STAT, SAS/ETS)
Problems Amenable to SAS Grid • Replicates of Fundamental task • Fundamental tasks are time consuming, lots of replicates • Examples • Simulation • Astrophysics • Bioinformatics • Ensembles of predictive models
Success Story • Financial Event Studies • Developed simulation tool to detect events • Simulated its performance • 25 hours finished in 40 minutes • Published in J. Fin. Econometrics • Old system: “Sneaker grid”
Another Success Story:Portfolio Analysis • 300 portfolios, 50 securities each by randomly sampling securities from CRSP daily database (7.23 Gigabytes) • 15 models created for each of 50 securities (PROC AUTOREG of SAS/ETS), under 169 treatment settings. • 126,750 models and associated data steps per portfolio. • 500 days of continuous computing time reduced to two weeks.
Notoriety • Web articles appeared in SAS, Grid today, Next-Gen Data forum • Interviewed by DataBase Trends and Applications
SAS Grid Structure • Client connects to host machines • Client sends replicates of fundamental task (“chunks”) to hosts • Hosts process chunks, send back to client • Client combines chunks and summarizes
SAS Farm • 100 SAS machines in student lab • 2.66 GhZ per node • All have SAS software installed • SAS “Spawner” must be started on all • Avaki also installed - diagnoses problems
Load Balancing • Automatically supports load balancing by farming out independent tasks to the next available resource. • Students never noticed that their machines were being used!
Simulation-Based Methods PROC MULTTEST of SAS/STAT(first hard-coded bootstrap?)
Simulation-Based Methods, II • Adjust=simulate in GLM and MIXED • Posterior simulation in MIXED
Toy Example – Testing Random Number Generators • Random number generators often fail to provide independent numbers. • Test case: U1, U2 are Uniform on (0,1). • If independent, then E{6(U1-U2)2} = 1.00. • Check: Generate many pairs, report average (should be 1.000000)
Startup (Windows) 1. Start Spawner: C:\Program Files\SAS\SAS 9.1>spawner -i -comamid tcp 2. Activate Spawner: 3. Set batch log in permissions:
The %Distribute Macro • Written by Cheryl Doninger and Randy Tobias • File: http://support.sas.com/rnd/scalability/papers/distribute.zip • Supporting document: http://support.sas.com/rnd/scalability/papers/distConnect0401.pdf
Problems We Have Experienced • Random crashes (client as well as hosts) • Diagnosing errors • I/O problems • Windows Service Pack 2 Firewall • Social issues (grid involves people!)
Future Plans • Support from business and government: • grid-enabled bioinformatics • business intelligence/data mining • Support HPC at TTU and in Texas