340 likes | 475 Views
Grid Activity at CCS. Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba. About Myself. Name Toshiyuki Amagasa Affiliation: Division of Computational Informatics, Center for Computational Sciences
E N D
Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba
About Myself • Name • Toshiyuki Amagasa • Affiliation: • Division of Computational Informatics, Center for Computational Sciences • Department of Computer Science, Graduate School of Systems and Information Engineering • Area of research • Data engineering • Database system • Recent topics • XML databases • Parallel XML query processing • OLAP analysis for XML • Web information extraction for XML • Databases in scientific applications • Faceted navigation for QCDml • Meteorological database
ILDG-JP Members • Prof. Mitsuhisa Sato (Director, CCS) • Prof. TomoteruYoshie (CCS) • Prof. Osamu Tatebe (CCS) • Dr. NaoyaUkita (CCS) • Prof. Toshiyuki Amagasa (CCS)
Talk Outline • Current Status of ILDG • A Brief History of JLDG • An Overview of JLDG • A Development of New ILDG Client • Faceted Navigation of QCDml • Conclusions and Future Work
A Brief History of JLDG(1/3) • Hepnet-J/sc 2002- (SINET GbE private network) • Widely-distributed file system • Network backbone: Super SINET VPN • Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U., Hiroshima U.,and Kanazawa U. • Objective and Implementation • Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security • Mirroring among FSs attached to SCs with administrative CP-PACS SR8000 CCP @Tsukuba File Server File Server CRC @ KEK Hepnet-J/sc YITP @Kyoto File Server File Server RCNP @Osaka SX-5 SX-5
A Brief History of JLDG(2/3) • Problems • Growing cost for managing data location • A dataset may be distributed in several disks. • It is hard for users to remember location of data and mirrors. • No concepts of users and user groups • Hard to support multiple research groups. • Necessary functionalities • A flat data sharing system which has not space limit (or can be extended at anytime) • Users and user group management over several organizations Japan Lattice Data Grid (JLDG) • Project launched in November 2005 • Operation started in March 2007
A Brief History of JLDG(3/3) • JLDG v1 started operation in May 2008 • Available datasets • CP-PACS Nf=2 QCD configuration • 8,000 files, 1.5 TBytes • CP-PACS/JLQCD Nf=2+1 QCDconfiguration • 21,000 files, 6 TBytes • PACS-CS Nf=2+1 323x64 lattice QCDconfiguration • 2,600 files, 3 TBytes • JLDG v2started operation in December 2009 • Storing and sharing research data generated in daily research activities • Data sharing within a research group
An Overview of JLDG • A widely-distributed file system with 100 TB-scale storage for domestic researchers in particle physics • Sharing simulation data computed by SCs for several months to several years. • Data files are distributed. Create replications if necessary. • A user do not need to recognize file locations. Files can be accessed very quickly if the site has replicas. • Storage space can be incrementally added during operation. Kanazawa ILDG KEK Gfarmfile system Kyoto www.jldg.org Hiroshima Tsukuba Osaka SINET3Network
Software Components • Globus Toolkit V4 (ANL) www.globus.org • GSI authentication, Proxy user certificate creation • GridFTPserver / client • VOMS (EDG) • VO management • Naregi-CA (Naregi) www.naregi.org • User / host certificate creation • Gfarmfile system(U. of Tsukuba) datafarm.apgrid.org • Widely-distributed file system • Uberftp(NCSA) • http://dims.ncsa.uiuc.edu/set/uberftp/ • Interactive GridFTPclient
/gfarm ggf jp file1 file2 aist gtrc file2 file1 file3 file4 Gfarm DistributedFile System • An open-source distributed file system • A global namespace to unify storage systems • Scalable I/O performance exploiting data access locality • Automated replica selection for fault-tolerance and load-balancing Global namespace Mapping Replica creation Gfarm File System
Summary • JLDG • A brief history • An overview • Used as an infrastructure for daily research activity • Hands on meeting on 27 Jan., 2009 Successfully done with19 attendees
Int’l Lattice Data Grid (ILDG) • A data grid for sharing Lattice QCD configuration • File Formats in ILDG • Configuration binary • LIME (Lattice QCD Interchange Message Encapsulation) • Metadata(QCDml) • ensemble XML • configuration XML • LFN (Logical File Name) • Identifier for configuration binary configuration (binary) configuration (binary) configuration (binary) configuration (binary) configuration XML configuration XML configuration XML configuration XML ensemble XML LFN LFN LFN LFN markovChainURI
QCDml Ensemble XML <markovChainxmlns=“…"> <markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C1600</markovChainURI> <management> <revisions>1</revisions> <collaboration>CP-PACS</collaboration> <projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action)</projectName> <ensembleLabel>B1800</ensembleLabel> <reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901</reference> <archiveHistory> <elem> <revision>1</revision> <revisionAction>add</revisionAction> <participant> <name>T.Yoshie</name> <institution>Center fof Computational Sciences, University of Tsukuba</institution>
Typical Usecase of ILDG LFN (Logical File Name) SURL (Site URL) TURL (Transfer URL) VOMS Authentication
Difficulties in Finding Desired Configuration • Directly use query language (XQuery / XPath) • A simple example: • Knowledge about XML, QCDml, and XQuery (XPath) are needed. • Hard to get the whole picture of available data. • Hierarchical list • Easy to use. • Need huge screen to show the entire list. • Still difficult to get the whole picture of the data. /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]
Basic Idea • Applying faceted-navigation interface to browse QCDml ensemble XML data.
Faceted-Navigation • What is “faceted-navigation”? • A scheme for browsing objects with attributes. • Successfully used in some applications, such as Apple iTunes. • Procedure • A user select a value in a facet • To select a set of objects of interest • The system updates the list of objects, list of facets, and respective values • (Repeat) • Example • The Flamenco Search http://flamenco.berkeley.edu/
Faceted-Navigation • Good features • Users have a freedom to choose a facet • c.f. Hierarchical list • Give a big picture of the dataset • Available values along with their population • Effective • Busch’s Law: 4 facets consisting of 10 values are enough to deal with 10,000 objects.
Technical Challenges • How to define facets? • How to extract values according to the facets? • How to achieve quick response from the database for improving user experience?
Choosing the Facets • Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe. • Selected elements from QCDml ensemble XML • Regional grid • Collaboration • Project name • Number of flavors • Time • Parameters • Lattice size • Gluon action • Parameters • Quark action • Parameters
Extracting Values from a Facet(1/3) CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … • Extract text values • Collaboration • Project name • Need substring extraction • Date 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … 2000 2005 2006 2007 2008 <date>2007-02-26T21:39:33+09:00</date>
Extracting Values from a Facet(2/3) <physics> <size> <elem> <name>X</name> <length>12</length> </elem> <elem> <name>Y</name> <length>12</length> </elem> <elem> <name>Z</name> <length>12</length> </elem> <elem> <name>T</name> <length>24</length> </elem> … • Need text value generation • Lattice size e.g. • 12 / 12 / 12 / 24
Extracting Values from a Facet(2/3) • Gluon action / Quark action • An element name itself represents a value Extract element name as a value of a facet <action> <gluon> <iwasakiRGGluonAction> <glossary>http://www.jldg.org/JLDG/... <action> <gluon> <DBW2GluonAction> <glossary>www.lqcd.org/ildg/pla...
QCDml Faceted Navigation I/FSystem Configuration Facet Navigation System (PHP + SQL + XQuery) Web Server (Apache) ILDG USQCD QCDml Ensemble (ILDG) & Configuration (JLDG) Facet Database JLDG LDG Facet extraction (XQuery) Downloading Ensemble XML UKQCD CSSM XML DB (eXist) RDBMS (MySQL)
Database Design (1/2) • Use RDBMS for quick response • Use fixed relational schema for extensibility *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007
Database Design (2/2) • Store preformatted text for improving rendering performance *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:
Conclusion and Future Work • Conclusion • Current Status of ILDG • A Development of New ILDG Client • Future work • Exploring more chances to apply data engineering techniques in various e-Science fields. • Data mining • Data integration • …
Thank you very muchfor your kind attention Questions should be addressed to amagasa@cs.tsukuba.ac.jp