1 / 101

Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review

Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review. November 5-6, 2007 Honolulu. CDR – Day 2. Topics: November 6 / Day 2. Welcome back (Heasley) ODM Continued (JHU team) The PSPS Data Retrieval Layer (Referentia team) Lunch The Web Based Interface (Heasley)

tait
Download Presentation

Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pan-STARRS PS1 Published Science Products Subsystem Critical Design Review November 5-6, 2007 Honolulu

  2. CDR – Day 2

  3. Topics: November 6 / Day 2 • Welcome back (Heasley) • ODM Continued (JHU team) • The PSPS Data Retrieval Layer (Referentia team) • Lunch • The Web Based Interface (Heasley) • Pan-STARRS External Interfaces (Heasley) • PSPS Test Plan (Heasley) • PSPS Schedule (Heasley) • System Level Risk Assessment (Heasley) • Executive Session (Committee only) • Recap Meeting of Review Panel with Subsystem and component leads • Adjourn

  4. The Object Data Manger SystemContinued The Johns Hopkins Team

  5. The PSPS Data Retrieval Layer The Referentia Team

  6. Pan-STARRS PS1 Published Science Products Subsystem Critical Design ReviewData Retrieval Layer (DRL) Nov 5-6, 2007 Referentia Systems, IncMatt Shawver, mshawver@referentia.com, 808-423-1900x111Kenn Yuen, kyuen@referentia.comChris Richmond, crichmond@referentia.comJay Knight, jknight@referentia.com

  7. Outline • Software Architecture • Requirements and Implementation Plans • Key Design Modifications • Test Plan • Development Schedule • DRL Development Status • Demo

  8. High Level Software Architecture

  9. DRL Software Architecture

  10. DRL Requirements • Query Analysis • Query Queuing and Execution • Result Caching • Result Retrieval • Administrative • Performance Monitoring • User Administration • Logging • Support multiple Data Managers • JHU Query Manager • MySQL • SQL Server • PostgreSQL

  11. Req: Query Analysis From previous design requirements: • Syntax validation • Current DM Resource Load • Query processing time estimate • Schema information

  12. Query Analysis Implementation • For syntax validation, SQL Server PARSEONLY command • For performance status, SQL Server sp_monitor procedure • Highly database implementation dependent • Exact prediction of query time is impossible. Instead show query execution plan. • Schema information will be retrieved by querying database metadata views and functions • For ODM, use Query Manager functionality when available

  13. Req: Query Queuing and Execution From previous design requirements: • Query any of the Data Managers • Provide status and progress information • Provide priority level with validation status results • Set query priority based on validation priority level

  14. Query Queuing and Execution Implementation • Issue: database query execution plans are not always accurate. • Alternative implementation: treat all queries the same at first. • Short, medium and long queues each have their own connections allocated • If a short or medium query takes longer than a certain amount of time, it will be moved to a longer queue • Queue sizes, and expiration times will be user configurable • If the long queue runs out of query slots, most recent query will be cancelled and restarted when a slot becomes available. • For ODM, use Query Manager queuing functionality (user chooses which queue to use)

  15. Req: Result Caching From previous design requirements: • Query result sets are stored in the DRL Cache until they have been retrieved by a PDC. • Purge retrieved result sets if space is needed

  16. Result Caching Implementation • Maintain results in result set cache as long as possible to allow repeated retrieval of results • With a large enough cache (terabyte), results should be typically held for a week or more • Link to past results via query history • Performance of the result set cache is critical for PSPS responsiveness • Hybrid memory / disk cache • LRU for memory and disk • In-memory index for fast disk retrieval • Retrieval of partial results • Efficiently support writing and reading multiple concurrent result sets

  17. Result Caching (continued) • Java Caching System (JCS) Implementation • Web server caching system • Uses Java serialization • In memory storage, with swapping to indexed file on disk • Built-in capability for distributing cache across multiple machines (untested) • Modified JCS to support synchronous puts when memory is full (wait for space to be freed via disk write) • Store Result Set as a list of objects each made up of a block of rows • Support many result sets (each result set can use as little as one block of rows in memory) • Adding memory speeds cache

  18. Req: Result Retrieval From previous design requirements: The PS1 DRL shall return query results in response to a query request.

  19. Result Set Retrieval Implementation • Don’t slow down fast queries • Return results immediately if query is very fast • Enable incremental results for queries with large data volumes • Status updates with number of rows retrieved • Execution status if supported by database • Support streaming CSV file download • Stream file directly from cache rather than creating on disk

  20. Req: Performance Monitoring From previous design requirements: • Allow administrators to monitor the performance of the DRL functions • I/O statistics • CPU statistics • memory statistics • process statistics • at configurable levels of detail

  21. Performance Monitoring Implementation • JMX over RMI to provide management interface to all JVM information collected at runtime • Does not provide CPU information • Use cross platform third party library if more detailed information is required • YourKit (http://www.yourkit.com) is one good third party option • Tradeoff: non-JVM profiling libraries incur overhead • Provide user configurable logs to a database to store historical information

  22. Req: User Administration From previous design requirements: • The PS1 DRL shall provide a computer security system to protect the DRL and PSPS DM Components from unauthorized access. • The PS1 DRL shall provide and authenticate at least two levels of access. • The PS1 DRL shall provide and authenticate privileged access for use of the private Administrative API. • The PS1 DRL shall provide and authenticate standard access for use of the public API.

  23. User Administration Implementation • Initial plan: Tomcat Realm JDBC based security with an in-process database • Straightforward • Independent of other components • Allows administrator to create and modify user accounts through web service • Allows association of additional information with user account • Role • Query log • Available result sets • Running queries

  24. Req: Logging From previous design requirements: • Log major system events • Query events • Unsuccessful Authentication Attempts • Server restarts • Any errors

  25. Logging Implementation • Log results via JDBC to in-process database • Move to external database if DRL is clustered in the future • Logs linked to user accounts (stored in same database)

  26. Key Modifications: DRL – DM ICD Changes • For non-ODM Data Managers, DRL should utilize JDBC directly rather than RMI and MBeans for performance and flexibility reasons • JDBC optimized for transfer of result set data • JDBC already abstracts much of database implementation details • Eliminate the RMI step, increase performance, and reduce complexity for Database developers • Use database security for DM rather than custom J2EE security

  27. Driver Specifics • Performance • Result set batching • Data Types • Schema Information Retrieval • Performance information retrieval

  28. Key Modifications: Caching Changes • Instead of purging results as soon as they are retrieved, associates results with query history and keep around as long as possible

  29. Key Modifications: Session Management • Connection and data persistence across web service calls • Get UUID back on login to identify session • UUID generator security (Java randomUUID for cryptographically strong randomness) • Web Services don’t usually save state • In this case, UUID tied to JHU Query Manager session

  30. Test Plan • Initial test plan draft developed • Includes more test details than existing DRL test plan • Key realization: • Need to define minimum requirements for integration of new Data Managers • Data Manager acceptance testing needed • Will be updated as we continue to make design decisions and as software is implemented

  31. Performance Testing • Performance Critical Components • Result Set Persistence • Stream large result sets directly to/from disk • HTTP for Data Transfer • Zip Data Compression • JDBC Drivers • Optimize use of JDBC Driver • Server Threading • Test with many distributed clients downloading • Connection persistence across web service calls • Division of machines / processes

  32. Software Delivery • Implementation will be provided using a version of the SDSS database as an example backend. MySQL and PostgreSQL will also be supported. • Example WBI will be provided with software • Example Java and .NET client applications will also be provided • Automated test suite will also be delivered

  33. Schedule

  34. Status • Completed review of specification • No DRL problems identified • Technologies chosen for implementation • Tomcat • Axis2 Web Service • Java Caching System for Result Set caching • Microsoft JDBC SQL Server Driver • Initial web service proof of concept developed • Draft test plan document

  35. Demo

  36. The PSPS Web Based Interface & User Interface Clients Jim Heasley

  37. The WBI • The WBI provides an interface for a human user to request published science data from the data stores via the DRL. It is one example of a Published Data Client (PDC). Note that there can be more than one PDC providing the same functionality. • The WBI provides both a Standard User API and an Administrative User API. • The WBI is in fact a combination of the infrastructure needed to connect to the DRL and some number of clients that access the PSPS data stores via the DRL. • Driving requirement – SECURITY • Preventing unauthorized access • Not about encrypting data for privacy! • REALITY CHECK – it’s a web server with some clients attached!

  38. WBI Components

  39. WBI Components • The WBI Software Design Description is in the SAIC generated document PSDC-630-WBI-SDD • The WBI Components are • Administrative User View • The user interface for an authenticated WBI administrator • Administrative Web Service Driver • Programming interface that converts a method call to its mapped SOAP message and sends it to the DRL. • There is a 1-to-1 mapping of requests handled by the Administrative Service Driver to SOAP messages defined in the DRL WSDL • Documented in PSDC-630-DRL-PDC-Private-ICD.html • Request Controller • Provides stateful management of user requests that may persist longer than a user’s WBI session.

  40. WBI Components • The WBI Components (continued): • Standard User View • Provides the user interface to an authenticated non-administrative WBI user. • Standard User Web Service Driver • Provides a programming interface that converts a method call to its mapped SOAP message and transmits it to the DRL. It also performs reverse function for responses/faults received from the DRL. There is a 1-to-1 mapping of request to the SOAP messages defined in the DRL WSDL. • Documented in PSDC-630-DLR-PDC-Public-ICD.html • WBI Account Manager • Responsible for authenticating users and granting access permissions to WBI functionality. Users will be identified by a user name and password which serves as the authentication credential.

  41. WBI Components • The WBI Components (continued): • WBI Log Configuration Manager • Permits an administrator to define logs, define the level & verbosity of event reporting, and identify events reported to administrators. • WBI Log Manger • Initializes logs on startup as defined in a configuration file. • Coordinates logs from multiple WBI components to ensure only level of logging specified is done

  42. WBI Detailed Design • Main challenge – negotiation of the Web Services Interface to the DRL. • These web services make use of concepts outside the realm familiar to traditional scientific programers, e.g., • XML • SOAP • WSDL • X.509 Certificates • Digital signatures • To simplify access for the WBI and other PDCs the Standard Web Service Driver has been encapsulated in an optional Java-based component named PDC-Core. • Documentation of this reference implementation is provided in the SAIC generated document PSDC-630-WBI-SDD-Addendum-A_Detailed-Design.html

  43. WBI User Interfaces • The components described to this point are there to provide the low-level functionality necessary to ask for and return data from the DRL and the data managers which connected to it. The astronomers won’t interact with them directly. • The USER INTERFACES are the web applications that use the web services provided by the WBI and DRL are the tools with which the astronomers will interact. • As mentioned yesterday, we have followed the advice of the PDR committee and are providing access via “recycled” web applications (the SDSS Casjobs web interface, hereafter Query Manager = Qm), reused tools (from the MOPS), and a work-alike clone of another existing web app (IPAC’s Gator).

  44. The SDSS Casjobs Web Interface

  45. PS1 Casjobs Interface = Qm

  46. A PS1 Menu Driven Web Application • Following the PDR, I developed a prototype of a menu driven web application for accessing the tables in the PS1 data base, modeled on the IPAC’s Infrared Science Archive Gator interface. • This application was developed using PHP, a server-side HTML embedded scripting language. There are PHP APIs available for most major databases. • The user interface allows generating SQL commands from scratch in a roll-your-own window or automated SQL generation from check box selection of database attributes and user specified spatial constraints. • The interface is configured using information stored in a MySQL database. This allows easy modification of schema, help files, etc.

  47. Menu Driven Web Interface Modules Menu Driven Queries Main Window catalogs collections schema sqlpage makeMenu glossary Roll your own SQL queries generateSQL submitSQL

  48. Crocodile Demo • This first demonstration shows Crocodile set up to use the 2MASS point source and extended source databases, the USNO UCAC astrometric catalog, the UCAC bright star supplemental catalog, the USNO-B catalog, and the Tycho 2 catalog. (Only 1% of the 2MASS PSC is implemented for the demo, along with the 2MASS XSC and the UCAC bright star supplement.) • This second demonstration shows an implementation of the Crocodile user interface configured to use an early version of the Pan-STARRS schema. There’s no back end database attached to this demo.

  49. MOPS Tools • Within the MOPS subsystem the group (in particular Larry Denneau) has developed an extensive set of software tools for interacting with the MOPS DB. As the SSDM will be a copy (and hot spare) of the MOPS DB, these tools can be interfaced to the WBI to provide access to the SSDM for use by astronomers without impacting the MOPS functional DB. • As Larry noted yesterday, the MOPS tools have been developed in PERL. • The next 3 slides are screen shots of • A summary page of a single night’s coverage with links to additional information • A section of the tracklet and linking efficiency page for a lunation • An a (semimajor axis) vs. e (eccentricity) plot of orbits discovered by MOPS

More Related