1 / 26

Outline

Enabling User-Oriented Data Access in a Satellite Data Portal Rajesh Kalyanam Lan Zhao Taezoon Park Carol X. Song RCAC, Purdue University, West Lafayette, IN 47907 Larry Biehl PTO, Purdue University, West Lafayette, IN 47907. Outline. Background Motivation System Design Data Production

deiondre
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling User-Oriented Data Access in a Satellite Data PortalRajesh KalyanamLan ZhaoTaezoon ParkCarol X. SongRCAC, Purdue University, West Lafayette, IN 47907Larry Biehl PTO, Purdue University, West Lafayette, IN 47907

  2. Outline • Background • Motivation • System Design • Data Production • Data Subscription • Data Delivery • Future Work

  3. Background • Overview of Purdue Terrestrial Observatory (PTO) • Remote-sensing research facility • Goes-12 GVAR, AVHRR, and MVISR sensor systems – AQUA/TERRA satellites • Component of the TeraGrid data provider framework • Satellite data products • Land, ocean and atmosphere data • Provide trends on local or continental scales • Used in climatology, hydrology, agriculture and transportation

  4. Example MODIS Products • Level 1A (MOD01) • Level 1B (MOD02) with/without bowtie correction • Geolocation (MOD03) • Aerosol (MOD04) • Water Vapor (MOD05) • Clouds (MOD06) • Atmospheric Profiles (MOD07) • Reflectance (MOD09) • Snow (MOD10) • Fire Detection (MOD14) • Ocean Color (MOD18) • Sea Surface Temperature (MOD28) • Sea Ice (MOD29) • Cloud Mask (MOD35) • Also Multiday composites of above Note that each data set product may contain a few to many variables.

  5. Variables in MOD04 Product Aerosol_Type_Land Angstrom_Exponent_1_Ocean Angstrom_Exponent_2_Ocean Angstrom_Exponent_Land Asymmetry_Factor_Average_Ocean Asymmetry_Factor_Best_Ocean Backscatter_Ratio_Average_Ocean Backscatter_Ratio_Best_Ocean Cloud_Condensation_Nuclei_Ocean Cloud_Fraction_Land Cloud_Fraction_Ocean Cloud_Mask_QA Continental_Optical_Depth_Land Corrected_Optical_Depth_Land Critical_Reflectance_Land Effect_Optical_Depth_Ave_Ocean Effect_Optical_Depth_Best_Ocean Effect_Radius_Ocean Error_Critical_Reflectance_Land Error_Path_Radiance_Land Estimated_Uncertainty_Land Least_Squares_Error_Ocean Mass_Concentration_Land Mass_Concentration_Ocean Mean_Reflectance_Land Mean_Reflectance_Land_All Mean_Reflectance_Ocean Number_Pixels_Percentile_Land Number_Pixels_Used_Ocean OptDepth_Ratio_Small_Land OptDepth_Ratio_Small_Land_Ocean OptDepth_Ratio_Small_Ocean Optical_Depth_Land_And_Ocean Optical_Depth_Large_Ave_Ocean Optical_Depth_Large_Best_Ocean Optical_Depth_Small_Ave_Ocean Optical_Depth_Small_Best_Ocean Optical_Depth_by_models_ocean Path_Radiance_Land QualityWt_Critical_Reflect_Land QualityWt_Path_Radiance_Land Quality_Assurance_Crit_Ref_Land Quality_Assurance_Land Quality_Assurance_Ocean Reflected_Flux_Average_Ocean Reflected_Flux_Best_Ocean Reflected_Flux_Land Reflected_Flux_Land_And_Ocean STD_Reflectance_Land STD_Reflectance_Ocean Scan_Start_Time Scattering_Angle Sensor_Azimuth Sensor_Zenith Solar_Azimuth Solar_Zenith Solution_Index_Ocean_Large Solution_Index_Ocean_Small Std_Dev_Reflectance_Land_All Transmitted_Flux_Average_Ocean Transmitted_Flux_Best_Ocean Transmitted_Flux_Land Latitude Longitude

  6. Motivation • User Requirement • Custom-tailored data configurations • Receive continuous data updates • Real-time or near-real-time access • Current Systems • Impossible to generate complete range of data products • Have to route through the support staff • Manual process which is time consuming and error-prone

  7. Motivation “Web-based data configuration, subscription and delivery system”

  8. System Design • Processing and Storage Backbone • PTO infrastructure • PTO data processing cluster • SDSC SRB middleware • Publish-Subscribe manager • Interface between the client side and the data processing backend • Manager user subscriptions • Handles enabling/disabling data production • Client side applications • Subscription interface • Data access portal

  9. System Design

  10. System Design • User-driven publish/subscribe model • Dynamic data generation • User specifies, controls, and receives custom-tailored data • Continuous data updates in near-real-time • Multiple ways to access the data

  11. Data Production • Data production software • SeaSpace TeraScan software • Configuration variables • Various projections and output formats • On-demand data production • User choice driven production • “configproc” file mechanism • Automatic enabling and disabling • scp based data transfer to SRB archive and webserver

  12. Data Production • Example configproc file input_directory: products/tdf/Local/modis/ndvi input_files: %yyyy.%mmdd.%hhmm.%satel.MYD_NDVI image_variable: EVI image_format: jpeg scale_range: -0.25 1.00 color_palette: modis_ndvi grid_delta: 0 boundaries: dcw.coast dcw.states max_width: 256 output_template: %yyyy.%mmdd.t_evi.jpg save_directory: products/images/modis save_files: 20??.????.t_evi.jpg

  13. Data Subscription • Data Subscription Components • Publish-Subscribe based subscription manager • Subscription Interface • Publish-Subscribe subscription manager • Simulates operation of a PubScribe system • Implemented through an Apache Axis webservice • Subscription Interface • Available on a web-based scientific gateway portal • Naïve and advanced user interfaces

  14. Data Subscription • Advanced user interface • Requires knowledge of variables involved in data product • Choice-list based configuration • AJAX dynamic filtering of choice lists • Will allow advanced configuration variables with strict logical composition rules • Naïve user interface • Plain English description : “bimonthly composite of vegetation data” • Scoring mechanism for selecting possible products • Learning mechanism for improving performance over time • Work in progress

  15. Data Subscription • Predicate matching • Keyword definitions for each data product : “BIMONTHLYCOMPOSITE of VEGETATION data” • Score captures the degree of correlation between descriptions and products • Additional keywords are added to a list for further consideration, scores are updated based on repetition frequency • Successful product descriptions are tagged • Tags can be reused by other users to search for common products

  16. Data Subscription

  17. Data Subscription • Subscription Manager • Subscription data management • Receives updates from data generator • Distributes notifications to subscribed users • Enabling and disabling data generation • Subscription data management • MySQL database • Product information – product key, generation frequency, configuration variables, filename pattern, webserver path • User subscription information – userid, product key, date range, email address

  18. Data Subscription • Pull-based notifications • Simpler approach • Perl script tracks updates to data repository • Loops through all data products based on the highest generation frequency • Trade-off between performance and notification delays

  19. Data Subscription • Push-based notifications • Requires tight integration with data generation process • Included as an entry in the configproc file • Product name argument is used to query list of users • Constraints on the execution node and environment

  20. Data Delivery • Http access • Users can download images off the webserver • Cannot verify if they are interested in the image • Images cannot be stored for a long time on the webserver • RSS feed based access • Thumbnails are sent as RSS feeds when new images are available • Users can download the actual image from the feed link based on the thumbnail • Data portal access of archive data • Can access archived data from the SRB server • Difficult to sift through the large number of images

  21. RSS Feed notification

  22. Future Work • Future Direction • Explore advantages of standard PubScribe models • Utilise current state of the art in ontology based methods for predicate mapping • Performance studies for scalability • Transfer data automatically to user specified location

  23. “A user-oriented subscription framework that will encourage broader access from the grid user community” Conclusion

  24. Acknowledgements This work was made possible by the National Science Foundation, TeraGrid Resource Partners grant OCI-0503992

  25. References • C. Baru, R. Moore, A. Rajasekar, M. Wan, "The SDSC Storage Resource Broker," Proc. CASCON’98 Conference, 1998. • Content Standard for Digital Geospatial Metadata” (CSDGM) Version 2 (FGDC-STD-001-1998), http://www.fgdc.gov/standards/documents/standards/metadata/v2_0698.pdf. • Content Standard for Digital Geospatial Metadata: Extensions for Remote Sensing Metadata (FGDC-STD-012-2002), http://www.fgdc.gov/standards/documents/standards/remote_sensing/MetadataRemoteSensingExtens.pdf. • C. Pautasso, "JOpera: An Agile Environment for Web Service Composition with Visual Unit Testing and Refactoring, " VL/HCC 2005. • Earth System Grid (ESG), http://www.earthsystemgrid.org/. • J. Novotny, M. Russell, O. Wehrens, "GridSphere: An Advanced Portal Framework, " EUROMICRO 2004, 412-419 • JSR 168: Portlet Specification http://www.jcp.org/jsr/detail/168.jsp. • L. Zhao, T. Park, R. Kalyanam, S. Goasguen, "Purdue Multidisciplinary Data Management Framework Using SRB", SRB Workshop, Vol. 1, pp. 6-11, February 2006. • LEAD Portal, http://lead.ou.edu. • MODIS portal from the Oregon State University direct broadcast station, http://sugar.coas.oregonstate.edu/MODIS/. • M. E. Pierce, G. C. Fox, H. Yuan, and Y. Deng, "Cyberinfrastructure and Web 2.0, " Proceedings of HPC2006, July 4 2006, Cetraro Italy. • M. E. Pierce, G. C. Fox, M. S. Aktas, G. Aydin, H. Gadgil, Z. Qi, and Ahmet Sayar, "The QuakeSim Project: Web Services for Managing Geophysical Data and Applications, " PAGEOPH Special Issue for 5th ACES International Workshop, Island of Maui, Hawaii. • nanoHUB, http://www.nanohub.org. • NEES portal, http://neesforge.nees.org/projects/simportal/. • Purdue Terrestrial Observatory, http://www.itap.purdue.edu/pto/. • R. Kalyanam, L. Zhao, T. Park and S. Goasguen, "A Service-Enabled Distributed Workflow System for Scientific Data Processing," Proceedings of IEEE Int’l Workshop on Future Trends of Distributed Computing Systems (FTDCS’07), Sedona, AZ, March, 2007. • SeaSpace Corporation, http://www.seaspace.com. • U. Nambiar, B. Ludaescher, K. Lin, C. Baru, "The GEON portal: accelerating knowledge discovery in the geosciences," Workshop On Web Information And Data Management Archive, Proceedings of the eighth ACM international workshop on Web information and data management, 2006. • Java Message Service, http://java.sun.com/products/jms

  26. Questions?

More Related