1 / 15

QCDgrid Technology

QCDgrid Technology. James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh. QCDgrid Technology. Currently a 4-site data grid provides reliable distributed data storage, including a searchable metadata catalogue A job submission system is also deployed

Download Presentation

QCDgrid Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh

  2. QCDgrid Technology • Currently a 4-site data grid • provides reliable distributed data storage, including a searchable metadata catalogue • A job submission system is also deployed • Key technologies used • Globus Toolkit 2.4 • European Data Grid • eXist XML database • Custom QCDgrid software builds on all these technologies, adding extra functionality and providing a convenient user interface

  3. The Data Grid • The data storage grid has been up and running for several months and is currently managing a few hundred gigabytes of data • Replication of files is managed by custom written software • built on Globus 2.4 • Central control thread • ensures there are always at least two copies of each file, stored at different sites • Replica catalogue • maps logical filenames to actual physical locations • Command line interface

  4. Metadata • Raw logical filenames may not be meaningful, so data could be hard to find • Metadata catalogue associates some important information with each file, making it easy to search the grid • Each file on the datagrid may have an associated XML metadata document Application (1) (4) Attributes of Desired Data (2) (3) Nearest file copy Logical File Names Metadata Service Data Grid • These documents are stored in the eXist open source XML database, where they can be searched using the XPath query language

  5. Metadata Browser • A browser GUI written in Java provides a user friendly interface to the XML metadata • Originally developed by OGSA-DAI, extended and now maintained by QCDgrid team

  6. Metadata/Datagrid Integration • Browser also integrates with lower level datagrid software through the Java Native Interface • Data can be fetched from the grid simply using the GUI • A simple GUI exists for data/metadata submission

  7. Job Submission: Requirements Next stage of the project is to allow data generation/analysis jobs to be easily submitted to grid machines • Integration with the existing datagrid is desirable • Resource brokering not particularly important, users normally know in advance on which machine a job should run, with a few exceptions • Real-time job status monitoring would be useful • Must work with a diverse range of machines from normal (Linux) PCs to QCDOC supercomputer • User-friendly GUI or web portal if time permits

  8. Job Submission: Technology • As with datagrid, requirements dictate a combination of existing software and purpose-built middleware • Globus toolkit used for low level access to grid resources and data • European Data Grid software used for virtual organisation management and security • Batch systems such as PBS integrated with the system • QCDgrid job submission software builds on these components, providing the interface and features that users need

  9. Job Submission: Status • Job submission system was developed on a test grid before being deployed on the main datagrid • Jobs can now be submitted to grid resources using a command line tool • Input files can be fetched automatically from datagrid • Job output and input can be streamed to and from the user’s console, allowing for job to be monitored, and even for interactive jobs to run on grid resources (useful for debugging) • All output files generated by the job are automatically brought back to the user’s local machine, or optionally stored on the datagrid • Can also submit to machines with only Globus (no QCDgrid installation)

  10. QCDgrid 2 • A follow on project, QCDgrid 2, is beginning • part of the larger GridPP 2 collaboration • Will build on what was created by QCDgrid • creating well defined web service interfaces to grid functionality • assisting international QCD collaboration efforts • strengthening the software to cope with increased loads in future • general maintenance and support for existing grid

  11. Beyond UKQCD: ILDG • ILDG stands for International Lattice DataGrid • A collaboration of scientists involved in lattice QCD from all over the world (UK, Japan, USA, France, Germany, Australia and other countries) • Working on standards to allow national datagrids to interoperate, for easier data sharing • Two working groups looking at different aspects of this goal: metadata and middleware • QCDgrid 2 has time specifically allocated for ILDG work

  12. ILDG/QCDgrid 2 Technology • ILDG is setting standards for interoperability between grids for QCD, the QCDgrid2 project will implement them on the UK’s grid • web service interface to metadata catalogue functionality • web service interface to data storage grid functionality – possibly based on Storage Resource Manager (SRM) • common XML schema for metadata. A schema for describing gauge configuration metadata has already been defined and will be extended to other data types • Security will be a bigger issue for international collaboration

  13. Other QCDgrid 2 Work • Maintain the existing QCDgrid software and provide support • Make any necessary additions to cope with future changes in usage of the grid • particularly QCDOC coming online later this year, which will likely produce unprecedented volumes of data • a risk analysis is underway to identify possible limitations of the software and how to deal with them • Provide tools to assist in the generation of metadata documents • a web-based form is the likely interface

  14. Summary • QCDgrid project has developed a grid for use by the UKQCD collaboration for storing data and performing computations • This consists of three software components: data grid, metadata catalogue and job submission tool • Software based on Globus toolkit and European Data Grid middleware • QCDgrid 2 project starting • will focus on international standards and web service interfaces • as well as maintaining and improving what already exists

  15. References • QCDgrid Web Site • http://www.gridpp.ac.uk/qcdgrid • ILDG Web Site • http://www.lqcd.org/ildg • European Data Grid Project • http://www.eu-datagrid.org/

More Related