1 / 34

U.S Geological Survey National Biological Information Infrastructure

U.S Geological Survey National Biological Information Infrastructure. Technical Overview: NBII Metadata Clearinghouse May 2008. Mike Frame. Topics for discussion. Metadata CH Background New Metadata CH Design & Demo Underlying Architecture. www. NBII. gov. My. NBII. gov. PORTAL.

naiara
Download Presentation

U.S Geological Survey National Biological Information Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. U.S Geological Survey National Biological Information Infrastructure Technical Overview: NBII Metadata Clearinghouse May 2008 Mike Frame

  2. Topics for discussion • Metadata CH Background • New Metadata CH Design & Demo • Underlying Architecture

  3. www . NBII . gov My . NBII . gov PORTAL Integrated View Content Management Collaboration Services Integrated / Federated Search Database and Web Geospatial Services Model Services Services Distributed Services Geo - ITIS Thesaurus DIGR Catalog Mapping Geoparsing Catalog Discovery Operations Catalog referencing Resource Catalog Resource and Geospatial text Model Geospatial Dublin Core ( plus ) Database and Service Catalogs Dataset Services Services Web Services Resource Catalog Catalog Catalog Clearinghouse OGC / ISO FGDC / ISO UDDI / WSDL ?? Describe and Discover Consume Distributed Resources Distributed Applications , Databases , Websites , Tools and Models Services Overview

  4. NBII Metadata Resourceshttp://metadata.nbii.gov http://metadata.nbii.gov

  5. Metadata Resources:FGDC Metadata Program NBII Clearinghouse Resources for using the Standard Tool reviews Training Opportunities

  6. Some basic metadata facts…about the FGDC Standard 7 Sections make up the FGDC Standard: • Identification Information • Data Quality Information • Spatial Data Information • Spatial Reference Information • Entity and Attribute Information • Data Distribution Information • Metadata Reference Information

  7. NBII Metadata CH

  8. Rational for Metadata CH Redesign • User Feedback • Metadata creation • Metadata management • Metadata integration with data • Open architecture framework • Speed and Reliability • Data quality • Data visualization • License Costs

  9. NBII Metadata CH provides: • Single portal to information contained in disparate data management systems • Free text, fielded, spatial, and temporal search capabilities • Allow individuals and database managers to distribute their data while maintaining complete control and ownership • Leverage investment in existing information systems and research • NBII is part of the Mercury Consortium @ ORNL

  10. NBII CH: New Functionalities • Rich Client Interface • Combined search results (status page) • Filterring search results (Facet) • Dynamic sorting of search results • Bookmark brief and full metadata pages • Based on open source technologies: • Lucene • Solr

  11. NBII CH New Functionalities Cont.. • SOA based design • Web services • RSS services for search results • Portlet support • Search Sharing support • Thesaurus Support • Seamless data ordering/data extraction with various data partners • Seamless data visualization integration with external visualization tools • Improved User Statistics Collection

  12. The NBII Clearinghouse The Clearinghouse is operated for NBII by the Oak Ridge National Laboratory Over 38,000 records 41 partners contributing metadata records Ability to search in a variety of ways Redesigned in 2008

  13. NBII CH Demo • NBII Clearinghouse interface: http://mercdev3.ornl.gov/nbii3/

  14. How does the NBII Clearinghouse work?

  15. How does the NBII Clearinghouse work?

  16. How does the NBII Clearinghouse work?

  17. How does the NBII Clearinghouse work?

  18. Metadata CH RSSWorld Data Center http://wdc.nbii.gov

  19. NBII Metadata ClearinghouseArchitecture

  20. Metadata CH Architecture • CH Function of the NBII Metadata Program Operated by ORNL • NBII is 1 Organization in Mercury Consortium • Established relationship in 2001 • Formerly based on “Blue Angel Technologies” • Currently based on Lucene/Solr Open Source Technologies

  21. Virtual Internet Database P.I. Name Product Number Product Title Site Subject Area Thematic Area Keywords etc. Index Distributed Data Discovery and Access System 1. Principal investigators create detailed metadata and data files using local applications or ORNL- OME 2. NBII Mercury collects metadata and key data from contributing agencies’ servers distributed around the country and builds a centralized index 5. Remoteusers select links to data of interest 6. Highly detailed data and documentation are downloaded directly from the contributing agency 3. Remote users query the index via a Web-based browser Users 4. Metadata summaries are returned to the remote users, including links back to detailed information and data at the PIs’ server or data repository P.I. Summary – John Smith Product A Container: 1; 10/12/2003 Container 2; 01/20/2002 Container 3; 07/05/2001 Product B Container 1; 03/05/1999 ….

  22. A Virtual Aggregate Database Existing Database Existing Database Existing Database Metadata exists in remote legacy databases using any platform, OS or RDBMS Databases can be of different structures and content No re-programming of existing systems required Business as usual for contributing databases Custom Export Program Custom Export Program Z39.50 or WS Export programs are easily written and automated Metadata are extracted into XML files yielding standardized data objects Encrypted XML Encrypted XML These files can be remotely harvested via the Internet Harvested metadata are combined at the central site, transformed (if needed), and indexed Index Frequent, automated harvesting and complete re-building of the index keeps the aggregate database up to date Users work with a single, simple, web-like interface to access all data simultaneously

  23. External Metadata FGDC-BIO MySQL Mercury3_harvests_nbii DB updater tool (custom Java) NBII CH Harvester Transformed Files Solr Schema for defining the fields Index metadata records Solr Indexer tool (custom java) Extended Lucene Index SOLR Search Server XML Beans to extract the contents Solr Searcher (custom Java Spring) Portlets Web Service RSS UI NBII CH Design Diagram http, ftp, web crawl

  24. Future Development • Phase II (May 2008 to September 2008): • Harvester engine to use open source tools (Remove COTS) (Phase I & II) • Portal integration through JSR-168 Portlet standard • Search portlets, portlets for recent datasets, top most searched words etc.. • Web service implementation (Phase I & II): • Thesaurus support (semantic web integration support) • Gazetteer web service implementation • OGC Catalog Service (include Web Mapping/Coverage/Feature Servers in search) • Universal Description, Discovery, and Integration (UDDI) Directory Services • Dynamic RSS support, including Geo-RSS support • ISO 19115 support • OpenSearch support • Documentation and Help (Phase I & II) • User Statistics Application modifications • Phase III (October 2008 to January 2009): • Save, Retrieve and Email user queries • Possible integration to OPeNDAP • Web Service Harvesting (OAI) • Internationalization • ????

  25. Search technology using Lucene/SOLR • Lucene • Overview • Who uses Lucene • Solr • Overview • Who uses Solr

  26. Lucene Overview • High-performance, full-featured text search engine library written entirely in Java • Mature Apache Open Source Java Project • Index speed and integrity, search speed • uses file based full text and inverted indexing • is extremely fast with built-in caching • Can easily handle millions of documents • Very active mailing list for support

  27. Who uses Lucene • Wikipedia • MediaWiki • European Bioinformatics Institute • Liferay • Bigsearch.ca • Monster • Academic Archive On-line • Complete list: • http://en.wikipedia.org/wiki/Lucene • http://wiki.apache.org/lucene-java/PoweredBy

  28. SOLR Overview • Open source enterprise search server based on the Lucene Java search library • Apache project, sub-project of Lucene • Advanced Full-Text Search Capabilities • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML and HTTP • Solr uses Lucene search library and extends it

  29. SOLR Overview Contd.. • A Real Data Schema, with Numeric Types, Date fields, Dynamic Fields • Dynamic Faceted Browsing and Filtering • Advanced, Configurable Text Analysis • Highly Configurable and User Extensible Caching • External Configuration via XML • Scalability - Efficient Replication to other Solr Search Servers • Administration Interface is available

  30. Who uses SOLR • CNET Reviews • shopper.com • AOL Music • netflix • search.com • The Digital Commonwealth • mindquarry • for complete list: http://wiki.apache.org/solr/PublicServers

  31. Mercury Instances Demo • NBII Clearinghouse interface: http://mercdev3.ornl.gov/nbii3/ • ORNLDAAC interface: http://daac.ornl.gov/ • LBA Mercury interface: http://mercdev3.ornl.gov/lba3/ • DADDI Mercury interface: http://mercdev3.ornl.gov/daddi3/ • GFIS RSS Portal interface: http://www.gfis.net/gfis/home.faces

  32. User Statistics Report Generation Tool

  33. Open source Harvester Re-design (Aperture)

  34. Questions, Comments, Mike Frame 865 576-3605 mike_frame@usgs.gov Thanks to: Giri PalanisamySystems Architect and Team LeaderMercury Consortium palanisamyg@ornl.gov Vivian Hutchison NBII Metadata Program Manager vhutchison@usgs.gov

More Related