1 / 32

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler. Michael P. Finn. High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014. Collaborators. Shaowen Wang, Anand Padmanabhan , Yan Liu

deanna
Download Presentation

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler Michael P. Finn High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014

  2. Collaborators • Shaowen Wang, AnandPadmanabhan, Yan Liu • University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory • David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel • USGS, Center of Excellence for Geospatial Information Science (CEGIS) • Kristina H. Yamamoto • USGS, National Geospatial Technical Operations Center  • BabakBehzad • UIUC, Department of Computer Science   • Eric Shook • Kent State University, Department of Geography • Qingfeng (Gene) Guan • China University of Geosciences

  3. Where Do We Want to Go? • Geospatial Analytics • Spatial Modeling • Geovisualization (GeoViz/ Visual Analytics) • For Decision Makers (agencies/ citizens) • Protect natural resources • Empower cultures • Provide for our future

  4. Geospatial AnalyticsSpatial Modeling/ Geovisualization

  5. So: • Where have we been? • Where are we now? • Where do we want to go?

  6. Data • Analog  Digital • “Big” Data • Spatial Data (geometric structure) • Data: Open? – mostly • Findable, Accessible, Exploitable (standard format) • Example: USGS Data holdings • 8 Layers of the National Map • Soon: Hyperspectral cubes and LiDAR point clouds

  7. The National Map- Elevation: Quality Levels http://nationalmap.gov/3DEP/neea.html

  8. Big Spatial Data • Geographic data of high resolution and covering large areas creates big spatial data • Remotely-sensed images • One-meter resolution NAIP images for Dent County, Missouri (1,955 km²) require 800 GB of storage space (more than 4 Pb equivalent for U.S.) • Atlanta footprint of 0.33 m resolution color images is almost 1 Tb of data • Satellite images with finer than one meter resolution • LiDAR data of level 1 (8 pts per square meter), level 2 (2 points per square meter)

  9. Big Spatial Data • USGS 3DEP – Level 2 LiDAR for all of U.S. except Alaska which is acquiring level 5 IfSAR • Data volume for point cloud, intensity images, and bare Earth elevation model – 7 to 9 petabytes • Processing and file creation usually doubles to triples the storage requirements • Other geospatial data – USGS National Hydrography Dataset based on 1:24,000 scale about 700 GB (equivalent resolution 12 m; accuracy 25 m RMSE) • New project to extract hydrography from level 2 lidar • How big will the vector (< 1 m Resolution) dataset be that results?

  10. Software • Computer compiled/ scripting languages • Manipulate data • Software • Commercial? Open? Modifiable code? Functional? • Tools: SAS (SPSS)/ R/ MATLAB, etc., etc….. • GIS Software: Esri ArcGIS/ QGIS • and image processing S/W: Imagine/ ENVI • Libraries: GDAL • Example software: mapIMG (based on CGTP; open)

  11. Geospatial Methods, Technologies, and Applications • Analytical Cartography • Mathematical Cartography • Since roughly the 18th Century • Quantitative Geography • Since 1960s • GIS (and image processing S/W) • Since about the 1970s • combining data & software  GIS Packages • Legacy of primarily commercial software • Open Source Software • Since roughly 1980s • OpenGIS? • early wide-spread but often spotty “open” GIS • Foundation for maturity, expansion, and further openness

  12. Here we are/ where are we going? • Open GIS: Technology and Applications (exploitable) • Hardware and Operating Systems evolving • Data Storage trying to keep pace with Big Data • Advanced GeoViz on cusp of exploding • HPC High-Performance Spatial Computing • Increasing Spatiotemporal fidelity • Cyberinfrastructure

  13. CyberGIS • Cyberinfrastructure(eScience) • HPC & GIScience • A balance/ interaction between theory/ data (Rey, 2013) • Collaborative Research • Standards (for interoperability)

  14. NSF CyberGIS Project • NSF Software Infrastructure for Sustained Innovation Award • http://cybergis.org • USGS/ CEGIS Participation • Cyberinfrastructure resources • XSEDE • Blue Waters supercomputer allocation • Open Science Grid • Integration • CyberGIS Toolkit • CyberGIS Gateway • GISolve middleware services

  15. CyberGIS Software Environment From Liu et al. (2014)

  16. CyberGIS Toolkit Software Components PABM – Parallel Agent-Based Modeling pRasterBlaster – Parallel Map Reprojection Parallel PySAL(Python Spatial Analysis Library) Spatial Text An open and reliable software toolbox for high-end users Hide compute complexity A rigorous software building, testing, packaging, and deployment framework Focused on computational intensity, performance, scalability, and portability in various CI environments Easy to configure and use

  17. Scalable Raster Processing • Need for scalable map reprojection in CyberGIS analytics • Spatial analysis and modeling • Distance calculation on raster cells requires appropriate projection • Visualization • Reprojection for faster visualization on Web Mercator base maps • pRasterBlaster integration in CyberGIS Toolkit and Gateway • Software componentization: librasterblaster, pRasterBlaster, MapIMG • Build, test, and documentation • Gateway user interface

  18. Performance Profiling • Performance profiling is an important tool for developing scalable and efficient high performance applications • Performance profiling identified computational bottlenecks in pRasterBlaster • Demonstration of one example of the value of profilers for pRasterBlaster in the next slides

  19. A Computational Bottleneck: Symptom

  20. A Computational Bottleneck: Symptom

  21. A Computational Bottleneck: Cause

  22. A Computational Bottleneck: Analysis • Spatial data-dependent performance anomaly • The anomaly is data dependent • Four corners of the raster dataset were processed by processors whose indexes are close to the two ends • Exception handling in C++ is costly • Coordinate transformation on nodata area was handled as an exception • Solution • Remove C++ exception handling part

  23. A Computational Bottleneck: Performance Improvement

  24. A Computational Bottleneck: Summary • Symptom • Processors responsible for polar regions spent more time than those processing equatorial region • Cause • Corner cells were mapped to invalid input raster cells generating exceptions • C++ exception handling was expensive • Solution • Removed C++ exception handling • Corner cells need not to be processed • They now contribute less time of computation

  25. pRasterBlaster Component View CyberToolkit pRasterBlaster librasterblaster MapIMG via API Cyberinfrastructure Service Providers GIS Programmers End Users

  26. Performance • Test: • On an XSEDE supercomputer (Trestles at the San Diego Supercomputing Center) • Using a parallel file system (Luster) and MPI I/O (vs. traditional Network File System (NFS)) • 40GB data • Processor cores were increased from 256 to 1024

  27. Obstacles, Issues, Challenges • Parallel I/O (particularly raster) is the proverbial long pole in tent • Raster decomposes nicely (embarrassingly parallel) • File I/O (especially output file re-composition) is a huge bottleneck • Lessons learned; one of our prime contributions to the community (to date): optimized parallel I/O for raster • GeoTIFF(SPTW – Simple Parallel TIFF Writer) led by David Mattli, USGS • HDF5 parallel work by BabakBahzad, UIUC

  28. Computational Challenges • Converting legacy (linear) code to HPC (parallel) environment requires a lot of skilled manpower • Scaling to large-scale analysis using HPC resources is difficult • Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis

  29. Geospatial AnalyticsSpatial Modeling/ Geovisualization • Solving “Changing World” Problems • Smart Decisions • Protecting Natural Resources • Democratizing Science • Empowering cultures • Products and Services for society and its citizens Data & Software  Solving (Geospatial) Problems

  30. Geospatial AnalyticsSpatial Modeling/ Geovisualization

  31. References • Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012).A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data.Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. • Finn, Michael P., Yan Liu, David M. Mattli, BabakBehzad,Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, AnandPadmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. • Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and BabakBehzad(2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia. • Liu, Yan, Michael P. Finn, BabakBehzad,andEric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Batltimore, Maryland. • Liu, Yan, AnandPadmanabhan, and ShaowenWang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256. • Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February. • http://cegis.usgs.gov/ • http://nationalmap.gov/3DEP/ • http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php • http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page • http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

  32. Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler Questions? http://cegis.usgs.gov/index.html High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014

More Related