1 / 18

Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

Supersite Relational Database Project: (Data Portal?) a sub- project of St. Louis Midwest Supersite Project. Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001. Web Service – input and output for the computer (not for the human)

selina
Download Presentation

Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supersite Relational Database Project:(Data Portal?)a sub- project of St. Louis Midwest Supersite Project Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

  2. Web Service – input and output for the computer (not for the human) • Data integration facilitated by output format suitable for merging • Rudimentary Query and Viewing facilities

  3. RPO FLM RPO FLM RPO FLM Regional Planning Orgs Federal Land Managers EPA EPA EPA EPA Regul. & Research Prominent Members of ‘PM Information Federation’ • sdhd SuperSite Industry NARSTO Academic Other: Private, Academic

  4. Resource and Tools Sharing among RPOs and Others. Scientific Rationale: • Regional haze and its precursors have a 1000-10000 km air shed. • Emissions in one RPO domain influences the the haze in surrounding RPOs • Cross-ROS knowledge sharing yield better science support to AQ management Administrative Rationale: • Some data and analyses are inherently national; breaking them up is extra work • Data sharing • and analysis tools can be re-used

  5. Data Management at Supersites • Most Supersites manage their respective data using SQL RDMS • St. Louis, LA and Fresno coordinate their RDMS in the format of the CCAQS • Other organizations follow similar approach (Houston, IMPROVE) • CAPITA has a wide range of regional auxiliary data set on open SQL Server

  6. Proposed Strategy for the Supersite Relational Data System: Data Portal • Make the individual Supersite SQL servers accessible from a Data Portal Server • Devise a simple, uniform virtual schema that is common to all databases and allows query by location, time, parameter and method • Use the Data Portal to query and retrieve data for cross-Supersite integrative analysis.

  7. Database Schema Design • Fact Table: A fact table (yellow) contains the main data of interest, i.e. the pollutant concentration by location, day, pollutant and measurement method. • Star Schema consists of a central fact table surrounded by de-normalized dimensional tables (blue) describing the sites, parameters, methods.. • Snowflake Schema is an extension of the star schema where each point of the star ‘explodes’ into further fully normalized tables, expanding the description of each dimension. • Snowflake schema can capture all the key data content and relationships if full detail. It is well suited for capturing and encoding complex monitoring data into a robust relational database.

  8. Abstract (Minimal) Star Schema forIntegrative, Cross-Supersite, Spatio-Temporal Analysis For integrative, cross-Supersite analysis, data queries by time, location and parameter, the database has to have time, location and parameter as dimensions • The minimal Site table includes SiteID, Name and Lat/Lon. • The minimal Parameter table consists of ParamterID, Description and Unit • The time dimensional table is usually skipped since time is self-describing • The minimal Fact (Data) table consists of the Obs_Value and the three dimensional codes for Obs_DateTime, Site_ID and Parameter_ID The above minimal (multidimensional) schema was used in the CAPITA data exploration software, Voyager for the past 22 years, encoding 1000+ datasets. Most Supersite data require a more elaborate schema to fully capture the content

  9. Extended Star Schema for SRDS The Supersite program employs a variety of instrument/sampling/procedures Hence, at least one additional dimension table is needed for Methods A example extended star schema encodes the IMPROVE relational database (B. Schichtel)

  10. Snowflake Example: Central Calif. AQ Study, CCAQS CCAQS schema incorporates a rich set of parameters needed for QA/QC (e.g. sample tracking) as well as for data analysis. The fully relational CCAQS schema permits the enforcing of integrity constraints and it has been demonstrated to be useful for data entry/verification. However, no two snowflakes are identical. The rich snowflake schemata for one sampling/analysis environment cannot be easily transplanted elsewhere. More importantly, many of the recorded parameters ‘on the fringes’ are not particularly useful for integrative, cross-supersite, regional analyses.

  11. Abstract Schema • A common feature of all SQL databases for AQ data is that they can be queried by along spatial, temporal and parameter dimensional queries. • However, the query to retrieve the same information depends on the of the particular database. • A way to homogenize the distributed data is access all the data through an abstract virtual schema. • The abstract schema would use only a subset of the tables/fields from any particular database (red) • The proposed abstract schema is the Minimal Star Schema, possibly expanded …. • The final form of the abstract schema will be arrived at by consensus Subset used for Abstract Schema

  12. Query to abstractstar schema Custom Components Web Services Query & Data Abstract DataAdapter & Schema SQLDataAdapter1 SQLServer1 SQLDataAdapter2 SQLServer2 CustomDataAdapter LegacyServer Data as universal DataSet Universal Query and DataSet using Abstract DataAdopter • The User selects an existing SQL database affiliated with the Supersite system • A Data Query is submitted to the database through a simple Star Schema (virtual). • The Abstract DataAdapter translates the abstract query into a specific query for to the selected database • The Abstract DataAdapter packages the returned data into self-describing ‘universal’ DataSet package • A ‘universally readable ‘DataSet’ is returned to the consuming data viewer/processor

  13. Federated Data Warehouse Features • Data reside in their respective home environment where it can mature. ‘Uprooted’ data in separated databases are not easily updated, maintained, enriched. • Abstract (universal) query/retrieval facilitates integration and comparison along the key dimensions (space, time, parameter, method) • The open data query based on Web Services promotes the building of further value chains: Data Viewers, Data Integration Programs, Automatic Report Generators etc.. • The data access through the Proxy server protects the data providers and the data users from security breaches, excessive detail

  14. Federated Data Warehouse Proxy Tier Data homogenization, transformation Provider Tier Heterogeneous data in distributed SQL Servers User Tier Data presentation, processing Tree-Tier Federated Data Warehouse Architecture • Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation members • Proxy Tier: Retrieves designated Provider data and homogenizes it into common, uniform Datasets • User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or processing

  15. Federated Data Warehouse Proxy Tier Data Homogenization, etc. Provider TierHeterogeneous Data User Tier Data Consumption Presentation SQLDataAdapter1 SQLServer1 SQLDataAdapter2 SQLServer2 Processing CustomDataAdapter LegacyServer Integration Proxy Server Data Access & Use Member Servers Fire Wall, Federation Contract Web Service, Uniform Query & Data Federated Data Warehouse Interactions • The Provider servers interact only with the Proxy Server in accordance with the Federation Contract • The contract sets the rules of interaction (accessible data subsets, types of queries) • Strong server security measures enforced, e.g. through Secure Socket layer • The data User interacts only with the generic Proxy Server using flexible Web Services interface • Generic data queries, applicable to all data in the Warehouse (e.g. data sub-cube by space, time, parameter) • The data query is addressed to the Web Service provided by the Proxy Server • Uniform, self-describing data packages are passed to the user for presentation or further processing

  16. Data Query and Returnthrough ‘Universal’ I/O DataQuery: Universal for all data GUI using Maps, Time..charts Explore the prototype Federated PM/Haze Data Warehouse IMPROVE @ CIRA DataAdapter: Specific to data source Here it is just a SQL statement DataReturn: Universal format for all data Can be merged, rendered, processed Surface Bext @CAPITA

  17. Technology of the Data Portal • SQL server, DTS • .NET

  18. Data Portal Prototype • Queries to IMPRVOE, CAPITA datasets

More Related