340 likes | 481 Views
Toshiyuki Amagasa is a prominent researcher at the University of Tsukuba, focusing on data engineering and database systems. His work encompasses XML databases, parallel XML query processing, and OLAP analysis for XML. He is also involved in developing web information extraction techniques and faceted navigation for scientific applications linked to lattice QCD. Amagasa plays a crucial role in the Japan Lattice Data Grid (JLDG), facilitating efficient data sharing among research institutions and enhancing collaboration in particle physics.
E N D
Grid Activity at CCS Toshiyuki Amagasa Center for Computational Sciences, Univertsity of Tsukuba
About Myself • Name • Toshiyuki Amagasa • Affiliation: • Division of Computational Informatics, Center for Computational Sciences • Department of Computer Science, Graduate School of Systems and Information Engineering • Area of research • Data engineering • Database system • Recent topics • XML databases • Parallel XML query processing • OLAP analysis for XML • Web information extraction for XML • Databases in scientific applications • Faceted navigation for QCDml • Meteorological database
ILDG-JP Members • Prof. Mitsuhisa Sato (Director, CCS) • Prof. TomoteruYoshie (CCS) • Prof. Osamu Tatebe (CCS) • Dr. NaoyaUkita (CCS) • Prof. Toshiyuki Amagasa (CCS)
Talk Outline • Current Status of ILDG • A Brief History of JLDG • An Overview of JLDG • A Development of New ILDG Client • Faceted Navigation of QCDml • Conclusions and Future Work
A Brief History of JLDG(1/3) • Hepnet-J/sc 2002- (SINET GbE private network) • Widely-distributed file system • Network backbone: Super SINET VPN • Institutes / Universities: KEK, U. Tsukuba, Kyoto U., Osaka U., Hiroshima U.,and Kanazawa U. • Objective and Implementation • Data sharing among institutes / universities, in which administrative policies are not homogeneous, while attaining security • Mirroring among FSs attached to SCs with administrative CP-PACS SR8000 CCP @Tsukuba File Server File Server CRC @ KEK Hepnet-J/sc YITP @Kyoto File Server File Server RCNP @Osaka SX-5 SX-5
A Brief History of JLDG(2/3) • Problems • Growing cost for managing data location • A dataset may be distributed in several disks. • It is hard for users to remember location of data and mirrors. • No concepts of users and user groups • Hard to support multiple research groups. • Necessary functionalities • A flat data sharing system which has not space limit (or can be extended at anytime) • Users and user group management over several organizations Japan Lattice Data Grid (JLDG) • Project launched in November 2005 • Operation started in March 2007
A Brief History of JLDG(3/3) • JLDG v1 started operation in May 2008 • Available datasets • CP-PACS Nf=2 QCD configuration • 8,000 files, 1.5 TBytes • CP-PACS/JLQCD Nf=2+1 QCDconfiguration • 21,000 files, 6 TBytes • PACS-CS Nf=2+1 323x64 lattice QCDconfiguration • 2,600 files, 3 TBytes • JLDG v2started operation in December 2009 • Storing and sharing research data generated in daily research activities • Data sharing within a research group
An Overview of JLDG • A widely-distributed file system with 100 TB-scale storage for domestic researchers in particle physics • Sharing simulation data computed by SCs for several months to several years. • Data files are distributed. Create replications if necessary. • A user do not need to recognize file locations. Files can be accessed very quickly if the site has replicas. • Storage space can be incrementally added during operation. Kanazawa ILDG KEK Gfarmfile system Kyoto www.jldg.org Hiroshima Tsukuba Osaka SINET3Network
Software Components • Globus Toolkit V4 (ANL) www.globus.org • GSI authentication, Proxy user certificate creation • GridFTPserver / client • VOMS (EDG) • VO management • Naregi-CA (Naregi) www.naregi.org • User / host certificate creation • Gfarmfile system(U. of Tsukuba) datafarm.apgrid.org • Widely-distributed file system • Uberftp(NCSA) • http://dims.ncsa.uiuc.edu/set/uberftp/ • Interactive GridFTPclient
/gfarm ggf jp file1 file2 aist gtrc file2 file1 file3 file4 Gfarm DistributedFile System • An open-source distributed file system • A global namespace to unify storage systems • Scalable I/O performance exploiting data access locality • Automated replica selection for fault-tolerance and load-balancing Global namespace Mapping Replica creation Gfarm File System
Summary • JLDG • A brief history • An overview • Used as an infrastructure for daily research activity • Hands on meeting on 27 Jan., 2009 Successfully done with19 attendees
Int’l Lattice Data Grid (ILDG) • A data grid for sharing Lattice QCD configuration • File Formats in ILDG • Configuration binary • LIME (Lattice QCD Interchange Message Encapsulation) • Metadata(QCDml) • ensemble XML • configuration XML • LFN (Logical File Name) • Identifier for configuration binary configuration (binary) configuration (binary) configuration (binary) configuration (binary) configuration XML configuration XML configuration XML configuration XML ensemble XML LFN LFN LFN LFN markovChainURI
QCDml Ensemble XML <markovChainxmlns=“…"> <markovChainURI>mc://JLDG/CP-PACS/RCNF2/RC12x24- B1800K014090C1600</markovChainURI> <management> <revisions>1</revisions> <collaboration>CP-PACS</collaboration> <projectName>RCNF2 (Nf=2 full QCD with iwasaki RG gauge and tadpole improved clover quark action)</projectName> <ensembleLabel>B1800</ensembleLabel> <reference>Phys.Rev. D65 (2002) 054505 (hep-lat/0105015), Erratum-ibid. D67 (2003) 059901</reference> <archiveHistory> <elem> <revision>1</revision> <revisionAction>add</revisionAction> <participant> <name>T.Yoshie</name> <institution>Center fof Computational Sciences, University of Tsukuba</institution>
Typical Usecase of ILDG LFN (Logical File Name) SURL (Site URL) TURL (Transfer URL) VOMS Authentication
Difficulties in Finding Desired Configuration • Directly use query language (XQuery / XPath) • A simple example: • Knowledge about XML, QCDml, and XQuery (XPath) are needed. • Hard to get the whole picture of available data. • Hierarchical list • Easy to use. • Need huge screen to show the entire list. • Still difficult to get the whole picture of the data. /markovChain[descendant::node()[local-name() = 'beta'] [number(text()) > 4] and descendant::node() [local-name() = 'collaboration'][text() = 'CSSM']]
Basic Idea • Applying faceted-navigation interface to browse QCDml ensemble XML data.
Faceted-Navigation • What is “faceted-navigation”? • A scheme for browsing objects with attributes. • Successfully used in some applications, such as Apple iTunes. • Procedure • A user select a value in a facet • To select a set of objects of interest • The system updates the list of objects, list of facets, and respective values • (Repeat) • Example • The Flamenco Search http://flamenco.berkeley.edu/
Faceted-Navigation • Good features • Users have a freedom to choose a facet • c.f. Hierarchical list • Give a big picture of the dataset • Available values along with their population • Effective • Busch’s Law: 4 facets consisting of 10 values are enough to deal with 10,000 objects.
Technical Challenges • How to define facets? • How to extract values according to the facets? • How to achieve quick response from the database for improving user experience?
Choosing the Facets • Discussion with Prof. Yoshie, Dr. Ishii, and Prof, Tatebe. • Selected elements from QCDml ensemble XML • Regional grid • Collaboration • Project name • Number of flavors • Time • Parameters • Lattice size • Gluon action • Parameters • Quark action • Parameters
Extracting Values from a Facet(1/3) CP-PACS CP-PACS+JLQCD CSSM LHPC MILC RBC-UKQCD UKQCD dik etmc gral qcdsf sesam theta txl … • Extract text values • Collaboration • Project name • Need substring extraction • Date 2+1 DWF 2+1 Dynamical AsqTAD Baryon Resonances Dynamical FLIC Studies Electromagnetic Form Factors FLIC Overlap Studies Flux Tube Test Gluon Propagator Long_aqstad_run Pentaquark Volume Dependence … 2000 2005 2006 2007 2008 <date>2007-02-26T21:39:33+09:00</date>
Extracting Values from a Facet(2/3) <physics> <size> <elem> <name>X</name> <length>12</length> </elem> <elem> <name>Y</name> <length>12</length> </elem> <elem> <name>Z</name> <length>12</length> </elem> <elem> <name>T</name> <length>24</length> </elem> … • Need text value generation • Lattice size e.g. • 12 / 12 / 12 / 24
Extracting Values from a Facet(2/3) • Gluon action / Quark action • An element name itself represents a value Extract element name as a value of a facet <action> <gluon> <iwasakiRGGluonAction> <glossary>http://www.jldg.org/JLDG/... <action> <gluon> <DBW2GluonAction> <glossary>www.lqcd.org/ildg/pla...
QCDml Faceted Navigation I/FSystem Configuration Facet Navigation System (PHP + SQL + XQuery) Web Server (Apache) ILDG USQCD QCDml Ensemble (ILDG) & Configuration (JLDG) Facet Database JLDG LDG Facet extraction (XQuery) Downloading Ensemble XML UKQCD CSSM XML DB (eXist) RDBMS (MySQL)
Database Design (1/2) • Use RDBMS for quick response • Use fixed relational schema for extensibility *************************** 1. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: rgrid value: cssm *************************** 2. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: collaboration value: CSSM *************************** 3. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: projectName value: Dynamical FLIC Studies *************************** 4. row *************************** uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC property: date value: 2007
Database Design (2/2) • Store preformatted text for improving rendering performance *************************** 1. row *************************** collaboration: CSSM size: 12/12/12/24 uri: mc://cssm/su3b08500k1300s12t24DBW2FLIC nf: 2 gact: DBW2GluonAction (beta=8.5) qact: fatLinkIrrelevantCloverQuarkAction (nf=2/kappa=0.1300) *************************** 2. row *************************** collaboration: CSSM size: 8/8/8/16 uri: mc://cssm/su3b09836s8t16DBW2 nf: gact: DBW2GluonAction (beta=9.836) qact:
Conclusion and Future Work • Conclusion • Current Status of ILDG • A Development of New ILDG Client • Future work • Exploring more chances to apply data engineering techniques in various e-Science fields. • Data mining • Data integration • …
Thank you very muchfor your kind attention Questions should be addressed to amagasa@cs.tsukuba.ac.jp