Cyberinfrastructure for Scalable and High Performance Geospatial Computation

Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011) and Zhong Chen (2012) School of Computational Science and Engineering (CSE) College of Computing, Georgia Institute of Technology

Overview • Keeneland and Kraken: the Cyberinfrastructure for our research and development • Scalable and high performance geospatial software modules developed in the past 1 year and 7 months

Keeneland: a hybrid computer architecture and system • A five-year Track 2D cooperative agreement awarded by the National Science Foundation (NSF) in 2009 • Developed by GA Tech, UT-Knoxville, and ORNL • 120 nodes [240 CPUs + 360 GPUs] • Integrated into XSEDE in July 2012 • Blue Waters – full scale of hybrid computer systems

Kraken: a Cray XT5 supercomputer • As of November 2010, Kraken is the 8th fastest computer in the world • The world’s first academic supercomputer to enter the petascale • Peak performance of 1.17 PetaFLOPs • 112,896 computing cores (18,816 2.6 GHz six-core AMD Opteron processors) • 147 TB of memory

Scalable and high performance geospatial computation (1) Interpolation Using IDW Algorithm on GPU and Keeneland • Performance comparison based on different scale of data (i.e. number of sample points) and the computing resources (Time is counted in second) • Speedup is calculated by the time used on a single CPU divided by the time used on the GPU(s) • Interpolation is calculated based on the value of 12 nearest neighbors • Output grid size: 1M+ cells

Scalable and high performance geospatial computation(2) Interpolation Using Kriging Algorithm on GPU and Keeneland • Performance comparison based on different scale of data (i.e. number of sample points) and the computing resources (Time is counted in second) • Speedup is calculated by the time used on a single CPU divided by the time used on the GPU(s) • Interpolation is calculated based on the value of 10 nearest neighbors • Output grid size: 1M+ cells Three Kriging approaches a) Spherical, b) Exponential, and c) Gaussian have been implemented on GPU/Keeneland

Scalable and High Performance Geospatial Computation(3)Parallelizing Cellular Automata (CA) on GPU and Keeneland (1) • Cellular Automata (CA) is the foundation for geospatial modeling and simulation, such as SLEUTH for urban growth simulation • Game of Life (GOL), invented by Cambridge mathematician John Conway, is a well-known generic CA that consists of a collection of cells which, based on a few mathematical rules, can live, die or multiply. • The Rules: • For a space that is 'populated': • Each cell with one or no neighbors dies, as if by loneliness. • Each cell with four or more neighbors dies, as if by overpopulation. • Each cell with two or three neighbors survives. • For a space that is 'empty' or 'unpopulated' • Each cell with three neighbors becomes populated.

Scalable and High Performance Geospatial Computation (3)Parallelizing Cellular Automata on GPU and Keeneland (2) A cell is “born” if it has exactly 3 neighbors, stays alive if it has 2 or 3 living neighbors, and dies otherwise. • Size of CA: 10,000 x 10,000 • Number of iterations: 100 • CPU time: ~ 100 minutes • GPU [desktop] time: ~ 6 minutes • Keeneland [20 GPUs]: 20 seconds CPU  Intel Xeon CPU 5110 @ 1.60 GHz, 3.25 GB of RAM GPU  NVIDIA GeForce GTX 260 with 27 streaming multiprocessors (SM) • A simple SLEUTH model has implemented on a single GPU • Implementation on Kraken and Keeneland using multiple GPUs is under development

Scalable and High Performance Geospatial Computation (4) Parallelizing ISODATA for Unsupervised Image Classification on Kraken (1) Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA) Performance comparison :: ERDAS uses 3:44:37 (13,477 seconds) to read image file [~ 2 minutes] and do the classification over one tile of 18 GB imagery data [0.5 m resolution in three bands] • 20+ hours to load data from GT into Kraken @ ORNL • The more cores are requested, the longer the waiting time will be • ~ 10 seconds to complete the classification process • I/O needs to be further optimized Our solution over Kraken using different number of cores with optimized stripe count and stripe size. Tue Jun 12 16:06:31 EDT 2012 Iteration 1: convergence = 0.000 Iteration 2: convergence = 0.919 Iteration 3: convergence = 0.937 Iteration 4: convergence = 0.953 ---- Classification completed ---- The reading file time is 47.8197 The classification time is 9.6519 The total ISODATA algorithm running time is 57.4716 Histogram: Class 0: 2811537623 Class 1: 14137169249 Class 2: 18231156326 Class 3: 17844190199 Class 4: 14839032207 Class 5: 8936914396 Application 1440335 resources: utime ~275810s, stime ~6377s Tue Jun 12 16:07:33 EDT 2012 Tue Jun 12 15:39:10 EDT 2012 Iteration 1: convergence = 0.000 Iteration 2: convergence = 0.919 Iteration 3: convergence = 0.936 Iteration 4: convergence = 0.953 ---- Classification completed ---- The reading file time is 53.5952 The classification time is 9.1167 The total ISODATA algorithm running time is 62.7119 Histogram: Class 0: 2811537615 Class 1: 8743937711 Class 2: 12122628756 Class 3: 11850984345 Class 4: 9714452352 Class 5: 5956459221 Application 1440071 resources: utime ~208415s, stime ~4110s Tue Jun 12 15:40:18 EDT 2012 Tue Jun 12 12:48:37 EDT 2012 Iteration 1: convergence = 0.000 Iteration 2: convergence = 0.918 Iteration 3: convergence = 0.938 Iteration 4: convergence = 0.954 ---- Classification completed ---- The reading file time is 15.4807 The classification time is 9.2374 The total ISODATA algorithm running time is 24.7181 Histogram: Class 0: 1124674113 Class 1: 1970406180 Class 2: 2845484626 Class 3: 2897947070 Class 4: 2298948648 Class 5: 1662539363 Application 1436660 resources: utime ~30211s, stime ~1215s Tue Jun 12 12:49:06 EDT 2012 Tue Jun 12 14:24:23 EDT 2012 Iteration 1: convergence = 0.000 Iteration 2: convergence = 0.915 Iteration 3: convergence = 0.935 Iteration 4: convergence = 0.952 ---- Classification completed ---- The reading file time is 28.6973 The classification time is 8.9810 The total ISODATA algorithm running time is 37.6782 Histogram: Class 0: 2811537615 Class 1: 3715199078 Class 2: 5660559329 Class 3: 5766104126 Class 4: 4652035362 Class 5: 2994564490 Application 1439048 resources: utime ~78392s, stime ~2164s Tue Jun 12 14:25:05 EDT 2012 72 GB 216 GB 36 GB 144 GB 1,800 Cores 3,600 Cores 7,200 Cores 10,800 Cores

Scalable and High Performance Geospatial Computation (4) Parallelizing ISODATA for Unsupervised Image Classification on Kraken (2) Iterative Self-Organizing Data Analysis Technique Algorithm (ISODATA) Performance comparison  to classify one tile of 18 GB image into 10, 15, and 20 classes, ERDAS uses about 5.5, 6.5, and 7.5 hours to complete 20 iterations, while the convergence number is less than 0.95

Scalable and High Performance Geospatial Computation (5) Near-repeat calculation for spatial-temporal analysis on crime events over GPU and Keeneland • Through a re-engineering process, the near-repeat calculation is first parallelized on to a NVIDIA GeForce GTX 260 GPU, which takes about 48.5 minutes to complete one calculation and 999 simulations on two event chains over 30,000 events. • Through a combination of MPI and GPU programs, we can dispatch the simulation work onto multiple nodes in Keeneland to accelerate the simulation process. • We use 100 GPUs on Keeneland to implement 1,000 simulations for about 264 seconds to complete this task. • If more GPUs were used, the simulation time can be reduced. One run of 4+ event chain calculation is easy to approach or go beyond petascale (1015) and exascale (1018)

Thank youQuestions?

Cyberinfrastructure for Scalable and High Performance Geospatial Computation

Cyberinfrastructure for Scalable and High Performance Geospatial Computation

Presentation Transcript

A High-Performance Scalable Graphics Architecture

Scalable Subgraph Mapping for Acyclic Computation Accelerators

Cyberinfrastructure for Thermochemical Computation

Heterogeneous Computing: New Directions for Efficient and Scalable High-Performance Computing

Scalable High Performance Dimension Reduction

Centiman : Scalable High Performance Transaction Processing

1 CyberInfrastructure and Geospatial Information Laboratory (CIGI)

Scalable High Performance Dimension Reduction

High-Performance Computation for Path Problems in Graphs

Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI)

High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research

Center for High Performance Visualization and Computation

A Scalable High-Performance Active Network Node

Cyberinfrastructure for Geospatial Computing

“High Performance Cyberinfrastructure for Data-Intensive Research”

Scalable Secure Distributed Computation

Developing Scalable High Performance Petabyte Distributed Databases

DynaSoar A Scalable Architecture for High Performance AI Applications

High Performance Cyberinfrastructure Required for Data Intensive Scientific Research

Performance system for scalable parallel and distributed high-performance computing

Scalable Cryptographic Authentication for High Performance Computing

Performance system for scalable parallel and distributed high-performance computing