1 / 29

Accessing the Outputs of Scientific Projects

Accessing the Outputs of Scientific Projects. Brian Matthews, Michael Wilson, Business & Information Technology Dept, CLRC Kerstin Kleese-van Dam E-Science Centre, CLRC b.m.matthews@rl.ac.uk. Overview. Science produces two outputs Conventional Publications Science Data Sets

reece
Download Presentation

Accessing the Outputs of Scientific Projects

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept, CLRC Kerstin Kleese-van Dam E-Science Centre, CLRC b.m.matthews@rl.ac.uk Brian Matthews, CRIS 2002, 31/08/02

  2. Overview • Science produces two outputs • Conventional Publications • Science Data Sets • In traditional Science, the 1st is used as a measure of success • The second is locked away. • In this talk I shall discuss: • A general purpose science data portal for allowing access to data sets • Potential links to publications. • To make all the outputs of science available. Brian Matthews, CRIS 2002, 31/08/02

  3. Who we are (CLRC) • Central Laboratory of the Research Councils • 1700 staff - supporting 12000 scientists and engineers from universities and industry • Based at 3 sites: • Daresbury Laboratory • Rutherford Appleton Laboratory • Chilbolton Observatory • A Multidisciplinary Laboratory Brian Matthews, CRIS 2002, 31/08/02

  4. Spallation Neutron and Muon Source (ISIS) Synchrotron Radiation Source (SRS) Lasers Microstructures Space Science and Technology Molecular Spectroscopy Earth Observation Atmospheric Science Computational Science Energy Research Information Technology Particle Physics Radio Communications Surfaces Transforms and Interfaces A Multidisciplinary Laboratory Brian Matthews, CRIS 2002, 31/08/02

  5. Scientific institutions generate vast quantities of data CLRC - ISIS, SRS, Space Science, Particle Physics, Computational Science, ... More data coming on stream all the time: CERN-LHC, Diamond, CASIM, HGP, ... Very good at handling large amounts of data Diverse approaches to organising and distributing it. The Problem Need a usable way of gaining access to the data Brian Matthews, CRIS 2002, 31/08/02

  6. User Scenarios • Lecturer: • This published study would be a good example for teaching, is the raw data publicly available? • Researcher: • This is an interesting paper - can I check the data? • Experiment Proposer: • Have there been any neutron or X-Ray studies of this molecule at 100 K? What reports and papers have been published on them? • Instrument Scientist: • The instrument seems a bit unstable recently, fetch me the results of all calibration runs from the last 3 months? Is there are report on this instrument? Need a usable way of gaining access to publications with data Brian Matthews, CRIS 2002, 31/08/02

  7. The Data Portal Concept • Single point of access to the CLRC data resources • Encompasses a wide range of data holdings • Describes what data is available from the facilities • Links to the data held at the facility • Different archiving methods • Caters for a wide range of users • general community  data curators • Supports a wide range of queries • employing data mining, thesauri, …. Brian Matthews, CRIS 2002, 31/08/02

  8. Combine Diverse Users & Searches ... Discovery Excavation Experimenter Data curator General community Wider science community Specialistuser Brian Matthews, CRIS 2002, 31/08/02

  9. … with Distributed Data Silos…. Facility 1 Facility 2 Facility 3 Facility 4 Brian Matthews, CRIS 2002, 31/08/02

  10. …using a central common metadata index ... Client http CLRC Data Access Server XML wrapper XML wrapper Local metadata Common metadata catalogue database Local data Facility 1 Brian Matthews, CRIS 2002, 31/08/02

  11. … and a Web based interface • Exploit the existing Web infrastructure. • Use New Technologies (XML/RDF); • rapidly disseminated; • widely accessible; • database and user platform independent • can be developed now, but with the GRID in mind. Every user who needs to can get to the information. Brian Matthews, CRIS 2002, 31/08/02

  12. Science Metadata Model Social Science ISIS SRS HEP Space Science Env. Science Metadata A generic metadata model for all scientific applications with Specialisation for each domain Can answer questions across domains Can answer questions about specific domains Brian Matthews, CRIS 2002, 31/08/02

  13. Metadata Object Topic Keywords providing a index on what the study is about. Study Description Provenance about what the study is, who did it and when. Access Conditions Conditions of use providing information on who and how the data can be accessed. Data Description Detailed description of the organisation of the data into datasets and files. Data Location Locations providing a navigational to where the data on the study can be found. Related Material References into the literature and community providing context about the study. Metadata Model Brian Matthews, CRIS 2002, 31/08/02

  14. Study Description • The Study is the basic unit for a scientific activity. • Can be further divided into: • Programmes: for connected studies. • Investigations: for a single measurement, experiment or simulation. Brian Matthews, CRIS 2002, 31/08/02

  15. Hierarchy of Data Holdings • With investigations, there are associated data holdings. • These are themselves arranged in a hierarchy: data sets, and files, with links between them • Logical organisation – identity separated from location. Investigation Data Holding Data Holding Data Holding Data-Set 1 (Raw) Data-Set 2 (Inter) Data-Set 3 (Final) File 1 name: date: File 1 name: date: File 1 name: date: Brian Matthews, CRIS 2002, 31/08/02

  16. Metadata example <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE CLRCMetadata SYSTEM "clrcmetadata.dtd"> <CLRCMetadata><MetadataRecord metadataID="N000001"> <Topic> <Discipline>Chemistry</Discipline> <Subject>Crystal Structure</Subject> <Subject>Copper</Subject>... <Experiment> <StudyName>Crystal Structure: Copper : Palladium: :complex: 150K ... <Investigator><Name><Surname>Porter...<Institution>University of Peebles ... <Funding>EPSRC ... <TimePeriod><StartDate><Date>21/04/1999…. <Purpose><Abstract> To study the structure of Copper and Palladium co-ordination complexes at a 150K. <DataManager><Name><Surname>Teat... <Instrument>SRS Station 9.8, BRUKER AXS SMART 1K... <Condition>...Wavelength...<Units>Angstrom...<ParamValue>0.6890... <Condition>…Crystal-to-detector distance<Units>cm...<ParamValue>5.00... <AccessConditions>The user has to be one of: Prof. F. Porter…. Brian Matthews, CRIS 2002, 31/08/02

  17. Submit proposal Prepare experiment Generate results Analyse results Write report Provenance metadata access conditions data description data location Related material + + + + Metadata collection Metadata collection and maintenance is a big problem. • But doing science is a process. Collecting the metadata can then become part of the experimental support environment Brian Matthews, CRIS 2002, 31/08/02

  18. CLRC Data Portal XML wrapper CLRC broker Common metadata catalogue database XML wrapper Local metadata Local data Facility 1 Architecture Users Other Data Portals Grid middleware XML wrapper XML wrapper XML wrapper Local metadata Local metadata Local metadata Local data Local data Local data Facility 2 Facility 3 Facility 4 Brian Matthews, CRIS 2002, 31/08/02

  19. Server Architecture USER Key: User input interpreter User output generator Internal http pre-set XSL Script Query Generator Response Generator module XML Parser External agent XML File • XML File Central metadata repository Local metadata repository Ascii file Brian Matthews, CRIS 2002, 31/08/02

  20. Example Result of searching: search across facilities - returns XML to session and displays summary Brian Matthews, CRIS 2002, 31/08/02

  21. Expand Results - give more details from the same XML Brian Matthews, CRIS 2002, 31/08/02

  22. Going Deeper - Can browse the data sets Brian Matthews, CRIS 2002, 31/08/02

  23. Select data - pick the required data files and download from convenient location. Brian Matthews, CRIS 2002, 31/08/02

  24. Current developments • Pilot completed • Consolidate and broaden existing system • move towards a development system • handle a greater diversity of data sources – e.g. Max Planck Institute for Meteorology • Enhance the Technology • Web services (SOAP, WDSL, OGSA, XML Query) • Provide links to other information sources: • Library systems • Thesauri Brian Matthews, CRIS 2002, 31/08/02

  25. Interface with existing archives • CLRC maintains existing data archives • Atmospheric, earth observation, STP, astronomy. • Existing access mechanisms (Web, Z39.50) • Existing metadata catalogues and formats • Can we use the Data Portal to access them? • Use the Metadata format as a framework to be specialised to express existing metadata framework • XML Query as a query layer on the archive Brian Matthews, CRIS 2002, 31/08/02

  26. Re-architect system • Break up the portal middleware into components. User service Grid Enable with Web Services Results collation RDF+DAML+OIL ontology service DP Query generation Security service Globus GSI XML Query Replication service Data source location replication service Globus GIS - MDS Brian Matthews, CRIS 2002, 31/08/02

  27. Access to Data and Publications • The Data Portal offers the potential to integrate the outputs of scientific research: data and publications. • Need to have a common search mechanism over library and data portals. • Can abstract the science metadata to Dublin Core. • Links to CERIF would further deepen connection. • Access to common thesauri for classification. • Common web service interface • Data Portal provides this. • XML Query as a communication mechanism Brian Matthews, CRIS 2002, 31/08/02

  28. Title Study: Name Creator Study: Investigator: Name (Role is principle investigator) Subject Topic: Keyword Description Study: Study Information: Purpose Publisher Investigation: Data Manager Contributor Study: Investigator: Name ; Investigation: Data Manager Date Study: Study Information: Time ResourceType Collection; or Dataset. Format Data Description: File Format ResourceIdentifier Study: Study Id (whole study) Data description: File: URI (for individual data files). Source Data description: Data sets: Related Data sets Related Material: Related work Language Not covered in the current metadata format; but an simple extension Relation Related Material: Related work Coverage Data description: Logical Description: Coverage RightsManagement Access Conditions Mapping between Dublin Core and Science Metadata Brian Matthews, CRIS 2002, 31/08/02

  29. Where are we? • Data Portal up and running • Being developed in the E-Science Centre in CLRC • http://esc.dl.ac.uk:9000/index.html • Science metadata proving very robust • Trying to extend its use into other areas of science – materials science, environmental science. • Beginning to approach the problem of integrating with electronic library resources. b.m.matthews@rl.ac.uk Brian Matthews, CRIS 2002, 31/08/02

More Related