1 / 37

Data Management

Data Management. Ron Trompert SARA Grid Tutorial, 18-19 September 2006. Outline. Storage Infrastructures SRM Storage Elements in gLite Low Level Data Management LCG File Catalog (LFC) Datamanagement CLIs and APIs Examples FTS. Storage Infrastructures. Disk-only

molimo
Download Presentation

Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management Ron Trompert SARA Grid Tutorial, 18-19 September 2006

  2. Outline • Storage Infrastructures • SRM • Storage Elements in gLite • Low Level Data Management • LCG File Catalog (LFC) • Datamanagement CLIs and APIs • Examples • FTS Grid Tutorial, RC RUG, 18-19 September 2006

  3. Storage Infrastructures • Disk-only • Hierarchical storage management (HSM) • policy-based management of file backup and archiving in a way that uses storage devices economically and without the user needing to be aware of when files are being retrieved from or stored on backup storage media. • The hierarchy represents different types of storage media, such as disks systems, optical storage, or tape, each type representing a different level of cost and speed of retrieval when access is needed. For example, as a file ages in an archive, it can be automatically moved to a slower but less expensive form of storage. • HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,… Grid Tutorial, RC RUG, 18-19 September 2006

  4. Storage Infrastructures • HSM example at SARA Grid Tutorial, RC RUG, 18-19 September 2006

  5. SRM • SRM standard • SRM implementations provide uniform access to heterogeneous storage resources on the Grid • Storage Resource Managers • SRM is a control protocol for: • Space reservation • File management • Pinning • Lifetime management • Replication • Protocol negotiation Grid Tutorial, RC RUG, 18-19 September 2006

  6. SRM • SRM implementation • SRM I/F is implemented as a web service • Implementations: • dCache (disk/HSM) • DPM (disk) • CASTOR (HSM) • SRB (disk/HSM) • …. • SRM Examples • srmRm • srmLs • srmPrepareToPut • srmBringOnline • srmCopy • srmGetTransferProtocols • …. Grid Tutorial, RC RUG, 18-19 September 2006

  7. Storage Elements in gLite • Classic SE • No SRM • Will become deprecated in the autumn of this year • Transfer protocols: gridftp • Storage type: disk • DPM • SRM • Transfer protocols: gridftp, secure rfio • Storage type: disk • dCache • SRM • Transfer protocols: gridftp, gsidcap • Storage type: disk, HSM Grid Tutorial, RC RUG, 18-19 September 2006

  8. Low Level Data Management • GridFTP (all SEs) • globus-url-copy file:///home/ron/file \gsiftp://srm.grid.sara.nl/pnfs/grid.sara.nl/data/dteam/file • Third party transfer • globus-url-copy gsiftp://hostA/pathA gsiftp://hostB/pathB • Also edg-gridftp-ls, edg-gridftp-rm, edg-gridftp-mkdir etc. • Uberftp • Interactive gridftp client • ftp commands • Gsi authentication Grid Tutorial, RC RUG, 18-19 September 2006

  9. Low Level Data Management • Gsidcap (dCache SEs) • dccp -p 20000:25000 /tmp/file \gsidcap://srm.grid.sara.nl:22128/pnfs/grid.sara.nl/data/dteam/file • 20000:25000 is derived from GLOBUS_TCP_PORT_RANGE environment variable • Secure rfio • rfcp /path/myfile \t2se01.physics.ox.ac.uk:/dpm/physics.ox.ac.uk/home/dteam/file • Srmcp ( ! Classic SEs ) • Srmcp file:////tmp/file \srm://srm.grid.sara.nl:8443//pnfs/grid.sara.nl/data/dteam/file Grid Tutorial, RC RUG, 18-19 September 2006

  10. Information system • LDAP-based • Ldap servers running on service nodes (GRIS/BDII) • Ldap servers collecting the information for a site (site BDII) • Ldap servers collecting the information for all sites (BDII) • Need to set environment variable LCG_GFAL_INFOSYS • Needs to be set to a BDII • lcg-infosites • Example: finding an SE: > lcg-infosites --vo tutor se Avail Space(Kb) Used Space(Kb) Type SEs ---------------------------------------------------------- 214632 1901097784 n.a tbn15.nikhef.nl 626880000 1163120000 n.a tbn18.nikhef.nl 488106596 368854044 n.a mu2.matrix.sara.nl Grid Tutorial, RC RUG, 18-19 September 2006

  11. Information system • lcg-info • For more advanced searches:For example, finding out where to put your files>lcg-info --list-se --query 'SE=mu2.matrix.sara.nl’ --attrs Path- SE: mu2.matrix.sara.nl- Path /flatfiles/SE00/tutor • ldapsearch • For the real troopers among us Grid Tutorial, RC RUG, 18-19 September 2006

  12. LFC • LFC stands for LCG File Catalog • LCG stands for LHC Computing Grid • LHC stands for Large Hadron Collider • User and programs produce and require data • Resource Broker can send (small amounts of) data to/from jobs: Input and Output Sandbox. Not recommended for large amounts of data • Data is stored on the grid • Located in Storage Elements • Several replicas of one file in different sites • Accessible by Grid users and applications from “anywhere” • Locatable by the WMS/RB (data requirements in JDL) • Also… • Data may be copied from/to local filesystems (WNs, UIs) to the Grid or opened remotely on the SE (GFAL,gsidcap,rfio). Grid Tutorial, RC RUG, 18-19 September 2006

  13. LFC • LFC • Keeps track of the location of copies (replicas) of files on the Grid Grid Tutorial, RC RUG, 18-19 September 2006

  14. Name conventions • Logical File Name (LFN) • An alias created by a user to refer to some item of data, e.g. “lfn:/grid/tutor/mydir/myfile” • Globally Unique Identifier (GUID) • A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” • Site URL (SURL) (or Physical File Name (PFN) or Site FN) • The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) • Transport URL (TURL) • Locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat” Grid Tutorial, RC RUG, 18-19 September 2006

  15. TURL 11 : SURL 1 LFN 1 TURL 1k : : GUID : LFN i TURL j1 SURL j : TURL jl Naming conventions • How do they fit together? • LFC holds the mapping LFN-GUID-SURL LFC Grid Tutorial, RC RUG, 18-19 September 2006

  16. LFC Grid Tutorial, RC RUG, 18-19 September 2006

  17. LFC • LFN acts as main key in the database. It has: • Symbolic links to it (additional LFNs) • Unique Identifier (GUID) • System metadata • Information on replicas • One field of user metadata Grid Tutorial, RC RUG, 18-19 September 2006

  18. LFC • Two kinds of LFC • Central LFCFor each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid. • Local LFCLocal catalogs record the file replicas stored at that site's SEs only. Grid Tutorial, RC RUG, 18-19 September 2006

  19. LFC • Provides: • User exposed transaction C/C++ API (+ auto rollback on failure) • Python wrapper provided (python module lfc) • Command line tools with administrative functionality • Hierarchical unix-like namespace and namespace operations for LFNs • lfn:/grid/<vo name>/mydir/myfile • lfc-mkdir, lfc-chmod • Integrated GSI Authentication + Authorization • Access Control Lists (Unix Permissions and POSIX ACLs) • Checksums • Sessions (multiple operations inside a single transaction ) • Bulk operations (inside transactions ) Grid Tutorial, RC RUG, 18-19 September 2006

  20. LFC Summary of the LFC Catalog commands Grid Tutorial, RC RUG, 18-19 September 2006

  21. LFC C/C++ API: Low level methods (many POSIX-like): lfc_setacl lfc_setatime lfc_setcomment lfc_seterrbuf lfc_setfsize lfc_starttrans lfc_stat lfc_symlink lfc_umask lfc_undelete lfc_unlink lfc_utime send2lfc lfc_access lfc_aborttrans lfc_addreplica lfc_apiinit lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_closedir lfc_creat lfc_delcomment lfc_delete lfc_deleteclass lfc_delreplica lfc_endtrans lfc_enterclass lfc_errmsg lfc_getacl lfc_getcomment lfc_getcwd lfc_getpath lfc_lchown lfc_listclass lfc_listlinks lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclass lfc_opendir lfc_queryclass lfc_readdir lfc_readlink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr Grid Tutorial, RC RUG, 18-19 September 2006

  22. LFC Interfaces • Integration with GFAL and lcg_utils APIs  lcg-utils/GFAL access the catalog in a transparent way • Integration with the WMS • The RB can locate Grid files: allows for data based match-making • Jdl file: • InputData = "lfn:/grid/tutor/MyFile"; Grid Tutorial, RC RUG, 18-19 September 2006

  23. Data Management CLIs & APIs • lcg_utils: lcg-* commands + lcg_* API calls • Provide (all) the functionality needed by the LCG user • Transparent interaction with file catalogs and storage interfaces when needed • Abstraction from technology of specific implementations • Grid File Access Library (GFAL): API • Adds file I/O and explicit catalog interaction functionality • Still provides the abstraction and transparency of lcg_utils Grid Tutorial, RC RUG, 18-19 September 2006

  24. Data Management CLIs & APIs lcg-utils commands: Replica Management lcg-utils commands: File Catalog Interaction Grid Tutorial, RC RUG, 18-19 September 2006

  25. Data Management CLIs & APIs lcg-utils C/C++ API: Grid Tutorial, RC RUG, 18-19 September 2006

  26. Data Management CLIs & APIs • GFAL • Grid storage interactions today require using some existing software components: • The file catalog services to locate valid replicas of files in order to : • Download them to the user local machine • Move them from a SE to another one • Make job running on the worker node able to access and manage files stored on remote storage element. • The SRM software to ensure: • Files existence on disk or disk pool (they are recalled from mass storage if necessary) • Space allocation on disk for new files (they are possibly migrated to mass storage later) Grid Tutorial, RC RUG, 18-19 September 2006

  27. Data Management CLIs & APIs • The GFAL Features • Hides interactions to the SRM to the end user • Provides a Posix-like interface for File I/O Operation • Posix calls prefixed with gfal_ • Based on shared libraries (both threaded e unthreaded version) • Needs only one header file (gfal_api.h) to write C applications • Supports following protocols : • file for local access, also lfn/guid • dcap, gsidcap and kdcap for dCache access protocol • rfio for CASTOR access protocol. • SRM • Access to SRMs in secure mode, i.e. using a valid Grid proxy obtained by voms-proxy-init command. Grid Tutorial, RC RUG, 18-19 September 2006

  28. Examples • Using lcg utils and lfc commands: • Define the server hostname • The LFC server must be published in the BDII ($LCG_GFAL_INFOSYS) • Use environmental variable: $LFC_HOST=<LFC_server_hostname> • $LFC_HOST must be set Grid Tutorial, RC RUG, 18-19 September 2006

  29. Examples Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--class] [--comment] [--deleted] [--display_side] [--ds] path… where pathspecifies the LFN pathname (mandatory) • Remember that LFC has a directory tree structure • /grid/<VO_name>/<you create it> • All members of a VO have read-write permissions under their directory • You can set LFC_HOME to use relative paths > lfc-ls /grid/tutor/me > export LFC_HOME=/grid/tutor > lfc-ls -l me > lfc-ls -l -R /grid LFC Namespace Defined by the user -l : long listing -R : list the contents of directories recursively:Don’t use it! Grid Tutorial, RC RUG, 18-19 September 2006

  30. Examples Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... • Where pathspecifies the LFC pathname • Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog beforehand. • Examples: > lfc-mkdir /grid/tutor/me You can just check the directory with: > lfc-ls -l /grid/tutor/me drwxr-xrwx 0 19122 1077 0 Jun 14 11:36 demo Grid Tutorial, RC RUG, 18-19 September 2006

  31. Examples Let us copy and register a file using lcg-utils > lcg-cr --vo tutor -l me/test -d mu2.matrix.sara.nl file:`pwd`/test guid:7b4efaef-bb0f-42a3-bb6f-bbe35080d105 > lcg-lr --vo tutor lfn:me/test sfn://mu2.matrix.sara.nl/flatfiles/SE00/tutor/generated/2006-09-18/file378fc829-351f-4558-8679-9d2ce530cbb4 > lfc-ls -l me -rw-rw-r-- 1 30010 2024 114 Sep 18 10:33 test Grid Tutorial, RC RUG, 18-19 September 2006

  32. Examples Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified fileor directory with linkname • Examples: > lfc-ln -s /grid/tutor/me/test /grid/tutor/aLink Let’s check the link using lfc-ls with long listing (-l): > lfc-ls -l lrwxrwxrwx 1 30010 2024 0 Sep 18 10:38 aLink -> /grid/tutor/me/test Original File Symbolic link Grid Tutorial, RC RUG, 18-19 September 2006

  33. Examples Adding/deleting metadata information lfc-setcomment path comment lfc-delcomment path lfc-setcomment adds/replaces a comment associated with a file/directory in the LFC Catalog lfc-delcomment deletes a comment previously added • This is the only metadata (one field) supported by the catalog • Examples: > lfc-setcomment me/test “nice file” • Let’s see what happened: > lfc-ls --comment /grid/tutor/me/test /grid/tutor/me/test nice file Grid Tutorial, RC RUG, 18-19 September 2006

  34. Examples Deleting the file lfc-rm lfc-rm removes file/link/directory only from the catalog lcg-del Lcg-del removes file from SEs and the lfns/links from the catalog • Examples, delete all replicas: > lcg-del –a --vo tutor guid:8e413879-7cb3-4260-af9f-6964392da7e8 • Example, delete only one replica: > lcg-del –a --vo tutor –s mu2.matrix.sara.nl guid:8e413879-7cb3-4260-af9f-6964392da7e8 Grid Tutorial, RC RUG, 18-19 September 2006

  35. File Transfer Service • A batch system for submitting datatransfer jobs • For data intensive sciences • Currently in use in the LCG project Grid Tutorial, RC RUG, 18-19 September 2006

  36. FTS • Allows for • Managed transfers by means of channels to sites • Channels are between sites i.e. CERN-SARA for example. • Site admins can adapt the configuration of incoming channels to their site, can switch their channel off etc. • Set priorities for different VOs. • Optimisation of network tuning parametres per channel Grid Tutorial, RC RUG, 18-19 September 2006

  37. FTS • Command line interface • glite-transfer-cancel • Cancels a file transfer job • glite-transfer-list • Lists ongoing data transfer jobs • glite-transfer-status • Displays the status of an ongoing data transfer job • glite-transfer-submit • Submits a new data transfer job Grid Tutorial, RC RUG, 18-19 September 2006

More Related