Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

Supersite Relational Database Project:(Data Portal?)a sub- project of St. Louis Midwest Supersite Project Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

Web Service – input and output for the computer (not for the human) • Data integration facilitated by output format suitable for merging • Rudimentary Query and Viewing facilities

RPO FLM RPO FLM RPO FLM Regional Planning Orgs Federal Land Managers EPA EPA EPA EPA Regul. & Research Prominent Members of ‘PM Information Federation’ • sdhd SuperSite Industry NARSTO Academic Other: Private, Academic

Resource and Tools Sharing among RPOs and Others. Scientific Rationale: • Regional haze and its precursors have a 1000-10000 km air shed. • Emissions in one RPO domain influences the the haze in surrounding RPOs • Cross-ROS knowledge sharing yield better science support to AQ management Administrative Rationale: • Some data and analyses are inherently national; breaking them up is extra work • Data sharing • and analysis tools can be re-used

Data Management at Supersites • Most Supersites manage their respective data using SQL RDMS • St. Louis, LA and Fresno coordinate their RDMS in the format of the CCAQS • Other organizations follow similar approach (Houston, IMPROVE) • CAPITA has a wide range of regional auxiliary data set on open SQL Server

Proposed Strategy for the Supersite Relational Data System: Data Portal • Make the individual Supersite SQL servers accessible from a Data Portal Server • Devise a simple, uniform virtual schema that is common to all databases and allows query by location, time, parameter and method • Use the Data Portal to query and retrieve data for cross-Supersite integrative analysis.

Database Schema Design • Fact Table: A fact table (yellow) contains the main data of interest, i.e. the pollutant concentration by location, day, pollutant and measurement method. • Star Schema consists of a central fact table surrounded by de-normalized dimensional tables (blue) describing the sites, parameters, methods.. • Snowflake Schema is an extension of the star schema where each point of the star ‘explodes’ into further fully normalized tables, expanding the description of each dimension. • Snowflake schema can capture all the key data content and relationships if full detail. It is well suited for capturing and encoding complex monitoring data into a robust relational database.

Abstract (Minimal) Star Schema forIntegrative, Cross-Supersite, Spatio-Temporal Analysis For integrative, cross-Supersite analysis, data queries by time, location and parameter, the database has to have time, location and parameter as dimensions • The minimal Site table includes SiteID, Name and Lat/Lon. • The minimal Parameter table consists of ParamterID, Description and Unit • The time dimensional table is usually skipped since time is self-describing • The minimal Fact (Data) table consists of the Obs_Value and the three dimensional codes for Obs_DateTime, Site_ID and Parameter_ID The above minimal (multidimensional) schema was used in the CAPITA data exploration software, Voyager for the past 22 years, encoding 1000+ datasets. Most Supersite data require a more elaborate schema to fully capture the content

Extended Star Schema for SRDS The Supersite program employs a variety of instrument/sampling/procedures Hence, at least one additional dimension table is needed for Methods A example extended star schema encodes the IMPROVE relational database (B. Schichtel)

Snowflake Example: Central Calif. AQ Study, CCAQS CCAQS schema incorporates a rich set of parameters needed for QA/QC (e.g. sample tracking) as well as for data analysis. The fully relational CCAQS schema permits the enforcing of integrity constraints and it has been demonstrated to be useful for data entry/verification. However, no two snowflakes are identical. The rich snowflake schemata for one sampling/analysis environment cannot be easily transplanted elsewhere. More importantly, many of the recorded parameters ‘on the fringes’ are not particularly useful for integrative, cross-supersite, regional analyses.

Abstract Schema • A common feature of all SQL databases for AQ data is that they can be queried by along spatial, temporal and parameter dimensional queries. • However, the query to retrieve the same information depends on the of the particular database. • A way to homogenize the distributed data is access all the data through an abstract virtual schema. • The abstract schema would use only a subset of the tables/fields from any particular database (red) • The proposed abstract schema is the Minimal Star Schema, possibly expanded …. • The final form of the abstract schema will be arrived at by consensus Subset used for Abstract Schema

Query to abstractstar schema Custom Components Web Services Query & Data Abstract DataAdapter & Schema SQLDataAdapter1 SQLServer1 SQLDataAdapter2 SQLServer2 CustomDataAdapter LegacyServer Data as universal DataSet Universal Query and DataSet using Abstract DataAdopter • The User selects an existing SQL database affiliated with the Supersite system • A Data Query is submitted to the database through a simple Star Schema (virtual). • The Abstract DataAdapter translates the abstract query into a specific query for to the selected database • The Abstract DataAdapter packages the returned data into self-describing ‘universal’ DataSet package • A ‘universally readable ‘DataSet’ is returned to the consuming data viewer/processor

Federated Data Warehouse Features • Data reside in their respective home environment where it can mature. ‘Uprooted’ data in separated databases are not easily updated, maintained, enriched. • Abstract (universal) query/retrieval facilitates integration and comparison along the key dimensions (space, time, parameter, method) • The open data query based on Web Services promotes the building of further value chains: Data Viewers, Data Integration Programs, Automatic Report Generators etc.. • The data access through the Proxy server protects the data providers and the data users from security breaches, excessive detail

Federated Data Warehouse Proxy Tier Data homogenization, transformation Provider Tier Heterogeneous data in distributed SQL Servers User Tier Data presentation, processing Tree-Tier Federated Data Warehouse Architecture • Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation members • Proxy Tier: Retrieves designated Provider data and homogenizes it into common, uniform Datasets • User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or processing

Federated Data Warehouse Proxy Tier Data Homogenization, etc. Provider TierHeterogeneous Data User Tier Data Consumption Presentation SQLDataAdapter1 SQLServer1 SQLDataAdapter2 SQLServer2 Processing CustomDataAdapter LegacyServer Integration Proxy Server Data Access & Use Member Servers Fire Wall, Federation Contract Web Service, Uniform Query & Data Federated Data Warehouse Interactions • The Provider servers interact only with the Proxy Server in accordance with the Federation Contract • The contract sets the rules of interaction (accessible data subsets, types of queries) • Strong server security measures enforced, e.g. through Secure Socket layer • The data User interacts only with the generic Proxy Server using flexible Web Services interface • Generic data queries, applicable to all data in the Warehouse (e.g. data sub-cube by space, time, parameter) • The data query is addressed to the Web Service provided by the Proxy Server • Uniform, self-describing data packages are passed to the user for presentation or further processing

Data Query and Returnthrough ‘Universal’ I/O DataQuery: Universal for all data GUI using Maps, Time..charts Explore the prototype Federated PM/Haze Data Warehouse IMPROVE @ CIRA DataAdapter: Specific to data source Here it is just a SQL statement DataReturn: Universal format for all data Can be merged, rendered, processed Surface Bext @CAPITA

Technology of the Data Portal • SQL server, DTS • .NET

Data Portal Prototype • Queries to IMPRVOE, CAPITA datasets

Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

Draft of the November 16, 2001 Presentation to the Supersite Program Nov 13, 2001

Presentation Transcript

FIMH 2001 Helsinki, November 15-16, 2001 Program

November 16 th 2001

Structural Analysis Jeff Bolognese 16 November 2001

November 1, 2001

July 16, 2001

2001 MTBPS Division of Revenue 6 November 2001 Presentation to Budget Committee

FASB 2001 Report FEI CFRI, November 2001

Presentation to MAPLD 2001 11-13 September 2001

13 June 2001

Nov 20, 2001, RBH

Margarida Baptista November 16, 2001

CoAX 2002 Demo - Agent Domains (Draft 30 Nov 2001)

(Jerusalem Post, Nov 2001)

CMU TDT Report 12-13 November 2001

The Problem Ballona Creek November 2001

Jeff Schmitt November 16, 2001

Subsystem Reports , TOF , Nov/16/2001 Hiroshi Tsuruoka

2001 Final Presentation

Thesis presentation Yakham NDIAYE November, 13 the 2001

National Geographic, Nov. 2001

Nov 8, 2001

Internetworking Nov 20, 2001