180 likes | 428 Views
The SDMX Registry Model. April 2, 2009 Arofan Gregory Open Data Foundation. Background. SDMX provides a number of standards and guidelines which support the standard exchange of statistics Standard models/XML/EDIFACT formats for data Standard models/XML formats for metadata
 
                
                E N D
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation
Background • SDMX provides a number of standards and guidelines which support the standard exchange of statistics • Standard models/XML/EDIFACT formats for data • Standard models/XML formats for metadata • Standard architecture based on a set of registry services • Guidelines for the use of standard statistical concepts across domain boundaries • Framework for establishing domain standards within each statistical domain
SDMX Registries • This talk focuses on the SDMX Registry Services • These are key to fully automating statistical discovery and exchange • They are the primary means of enhancing visibility and discovery of data and metadata within statistical communities • They are designed to provide a connection point between SDMX and other related standards
Existing Problems • Duplication of effort • There is a lot of duplicative work within statistics, because there is little awareness of other data collection within specific areas • This is wasteful • Even with a large amount of public statistical data available on the Internet… • It is difficult to find good data with good metadata • This impacts end-users (researchers, students, journalists) more than policy makers with dedicated access to the data • Using existing data can be difficult • Too many formats – too much emphasis on Web-site presentation (as opposed to download) • Too little metadata for existing data sets • Difficult or impossible to combine data from different sources • Access to data sources is difficult or impossible (not even the documentation is accessible) • Understanding concepts and definitions can be challenging – this impacts comparability of data
The Case for Infrastructure Support • New standards allow for broader visibility and re-use of data and metadata • Produces greater transparency • Produces higher quality and efficiency in data access through automation • Domains cannot be governed by individual organizations • The mission of most organizations is too narrow (even international ones) • This is the role of governments, supra-national initiatives, and public-private consortia • Most public data is paid for by the taxpayers • But they are the least-well served for their investment
Emerging Solutions • Web-services technology can deal with many of the generic problems inherent in distributing data sources and applications around the Internet • Standards such as DDI, SDMX, and ISO/IEC 11179 provide specific models and formats for use within the domains of statistics and research • SDMX provides a powerful registry model for establishing a research infrastructure • Designed to integrate with/support use of many other related standards (DDI, ISO 11179, METS, XBRL, etc.) • SDMX registry tools are available free and as open source today
How do the SDMX Registry Services Work? • An SDMX Registry (that is, an implementation of the standard registry services) provides a number of things to applications: • A repository of metadata about the structures and concepts of data and metadata sets • A repository of information about who provides what data and metadata to whom • Helps to manage data across a broad network • A registry of available data and metadata sets in standard formats • Lists all information to find and use standard data and metadata throughout a community network
SDMX Registry/Repository SDMX Registry Interfaces Register Indexes data and metadata REGISTRY Data Set/Metadata Set Query Submit Describes data and metadata sources and reporting processes REPOSITORY Provisioning Metadata Query Submit REPOSITORY Structural Metadata Describes data and metadata structures Query
SDMX Registry/Repository SDMX Registry Interfaces Register Indexes data and metadata REGISTRY Data Set/Metadata Set Query Subscription/Notification Applications can subscribe to notification of new or changed objects Submit REPOSITORY Provisioning Metadata Query Submit REPOSITORY Structural Metadata Describes data and metadata structures Query
Deploying SDMX Registry Services Within Domains • It is anticipated that each organization leading a statistical domain will deploy a set of registry services to support exchanges within that domain • This is also possible within national statistical systems and individual organizations • It is possible to have generic, “public” registries as well • This model has not been widely explored • SDMX-type registries within research domains also make sense • To supplement existing data archives and RDCs • Lowers the cost of development of research infrastructure significantly • Huge increase in visibility of and access to data and sourcing information
The Old JEDH (Joint External Debt Hub) Site BIS WEBSITE IMF OECD World Bank (Various Formats) (3-month production cycle)
JEDH with SDMX Retrieves data from sites BIS SDMX “Agent” SDMX-ML SDMX-ML Loaded into JEDH DB [Info about data is registered] IMF SDMX-ML Discover data and URLs SDMX Registry OECD SDMX-ML Data provided in real time to site World Bank SDMX-ML JEDH Site SDMX-ML (Debtor database)
SDMX in Action: Prototype System FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS FAO SDMX Registry 2 3a National Publication Server(s) Regional Publication Server 3b Flow of FAO CountrySTAT- RegionSTAT Implementation 4 1 RegionSTAT CountrySTAT Slide courtesy of the FAO
Prototype System: Explanation FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS • CountryStat National Publication Server • The web site is published from the files in CountryStat • SDMX Publication • The new CountryStat files are converted to SDMX-ML data sets and made web accessible on the CountryStat web site • These files are registered in the FAO SDMX Registry • RegionStat Regional Publication Server • Queries the registry for new registrations which responds with registration details including the URL of the new data sets • Retrieves the new data sets from the CountryStat web site • Converts the SDMX-ML files to an internal format and integrates the new data sets with existing RegionStat data sets • Re-publishes the RegionStat web site 1 2 3a 3b 4 Slide courtesy of the FAO
Federation of SDMX Registries • SDMX uses a selective approach to replication of resources found inside domain SDMX registries • Each domain registry can become a recognized user in other domain registries • Subscription/notification can drive real-time replication of registry metadata around the network • With a coordinated “hub” registry, a more formal registry network could be established • This would require no extension to existing technologies • This would require a major feat of organization (!) • This is a very “light” federation mechanism • Other, more intensive schemes have failed in other technology domains (UDDI, etc.)
SDMX Registries and Other Standards • The SDMX Registry Services are designed to support related standards • SDMX “reference metadata” reports can provide links to metadata and data in other standard formats • Allows for indexing of needed metadata fields from other standards within the SDMX registry natively • Can provide access to native non-SDMX formatted XML resources (DDI, Dublin Core, METS, XBRL, etc.) • Benefits include: • Clarifying data and metadata ownership issues • Making sourcing transparent by linking aggregates to source data/metadata • Provide capabilities which are typically not available today to support comparison (integration with ISO/IEC 11179 metadata registries for dealing with terminology issues, etc.)
Clarification • Not all registries are the same • UDDI and ebXML registries are much more generic in purpose, and compatible with SDMX • ISO/IEC Metadata Registries are not mechanistic web-services registries • They are specialized repositories of metadata around semantics, concepts and terminology • These are compatible with, not duplicative of, SDMX registry technology • ISO/IEC 11179 could be implemented as an SDMX registry (!)
ODaF Vision - Standards Federated Registries (Based on SDMX, ebXML, web services) ISO 11179 Semantic definitions Aggregated Data/Metadata (SDMX) registered Organized using References to source data METS Packaging XBRL Business Reports DDI Microdata Sets Standard classifications Dublin Core Citations Used in ISO 19115 Geographies