150 likes | 284 Views
This document provides a comprehensive overview of the Gateway Implementation discussed at the ESG-CET Meeting in Boulder, CO, in April 2008. It covers key aspects such as implementation technologies, database integration, and the use of PostgreSQL, Java, and Hibernate. The presentation highlights data integrity, schema organization, and future features, including user-interactive annotations and tagging. Additionally, it explores the principles behind RDF search integration and metrics systems, ensuring a robust and efficient approach to managing and accessing scientific metadata.
E N D
Gateway Implementation 4/30/2008 ESG-CET Meeting, Boulder, CO, April 2008
Overview • Implementation Technologies / Tools • Science Metadata Implementation • Browse Interface • RDF Search Integration • Data Downloading • Metrics Integration ESG-CET Meeting, Boulder, CO, April 2008
Database Driven Approach • All metadata and associated elements stored in a single database • Data integrity for all elements enforced at the database level • Normalization reduces the amount of duplicated data over previous system • Concurrency and transaction control spanning all related elements • Hot backups supported ESG-CET Meeting, Boulder, CO, April 2008
Database Implementation • PostgreSQL 8.3 selected as the database engine • Better performance and scalability over MySQL • Feature rich and good SQL standard compliance • Full transactional support • OpenBSD license, no dual licensing issues ESG-CET Meeting, Boulder, CO, April 2008
Gateway Implementation • Java based • Spring Framework: • Lightweight Inversion of Control Container (IoC) • Acegi (Spring Security) • Web application support • Database access abstractions (transactions, exception handling, etc) • Full application support, integration of many useful libraries ESG-CET Meeting, Boulder, CO, April 2008
Gateway Implementation • Hibernate: Object Relational Mapping • Maps Java objects to the database • Greatly reduces the amount of database code that needs to be written • Built-in caching, optimized join lookups, and other performance enhancements ESG-CET Meeting, Boulder, CO, April 2008
Database Schema • Still under very active development • Currently 92 tables • Database is separated into 4 logical schemas • Metadata • Metrics • Security • Workspace ESG-CET Meeting, Boulder, CO, April 2008
Science Metadata Schema(subset) ESG-CET Meeting, Boulder, CO, April 2008
Browse Interface • Driven completely from the database • Efficient queries and data structures • Straight forward to cache queries and results • Relatively static structures involved ESG-CET Meeting, Boulder, CO, April 2008
Future Features • Annotations • User submitted comments on resources • Can be applied to collections and logical files • Notifications sent to resource owners and admins for review • Tagging • User defined and assigned keywords • Can be assigned at the collection level • Browsable and searchable • Notifications sent to resource owners and admins for review ESG-CET Meeting, Boulder, CO, April 2008
RDF Integration • Database is the authoritative source for the RDF search data • Event mechanism to trigger RDF updates when the underlying database changes • Database contains detailed information beyond what is stored in RDF ESG-CET Meeting, Boulder, CO, April 2008
Data Download • Data can be retrieved directly from data nodes or the gateway when data is local • Files can be directly downloaded through the gateway interface • Bulk data retrieval scripts can be created through the user interface • WGET is currently supported • Additional options such as DML to come • Deep storage retrieval requests generated from the same interface ESG-CET Meeting, Boulder, CO, April 2008
Authorization Tokens • Lightweight tokens are used to allow users to download restricted files using standard tools, such as standard HTTP clients • Limited lifetime • Grants a particular user access to only a specific resource • Currently implemented for direct gateway downloads and appropriately configured TDS servers ESG-CET Meeting, Boulder, CO, April 2008
Authorization Tokens ESG-CET Meeting, Boulder, CO, April 2008
Metrics System • Metrics data integrated with access control and metadata schemas • Associated with user accounts and inventory metadata • Accurate associations of activities without duplication of data • Use of Jasper reports to allow more flexible options for creating new metrics reports in the system • Evaluating the use of star schemas to allow for better report query performance / options ESG-CET Meeting, Boulder, CO, April 2008