1 / 37

Data management and LFC: The LCG File Catalog

Data management and LFC: The LCG File Catalog. Stefano Cozzini (slides from Antonio Delgado Peris). Introduction. User and programs produce and require data Data may be stored in Grid datasets (files) Located in Storage Elements ( SEs ) Several replicas of one file in different sites

marci
Download Presentation

Data management and LFC: The LCG File Catalog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data management andLFC: The LCG File Catalog Stefano Cozzini (slides from Antonio Delgado Peris)

  2. Introduction • User and programs produce and require data • Data may be stored in Grid datasets (files) • Located in Storage Elements (SEs) • Several replicas of one file in different sites • Accessible by Grid users and applications from “anywhere” • Locatable by the WMS (data requirements in JDL) • Also… • Resource Broker can send (small amounts of) data to/from jobs: Input and Output Sandbox • Data may be copied from/to local filesystems (WNs, UIs) to the Grid ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  3. Name conventions • Logical File Name (LFN) • An alias created by a user to refer to some item of data, e.g. “lfn:cms/20030203/run2/track1” • Globally Unique Identifier (GUID) • A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” • Site URL (SURL) (or Physical File Name (PFN) or Site FN) • The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” (SRM) “sfn://lxshare0209.cern.ch/data/alice/ntuples.dat” (Classic SE) • Transport URL (TURL) • Temporary locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat” ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  4. File Catalogs in LCG • File catalogs in LCG: • They keep track of the location of copies (replicas) of Grid files • The DM tools and APIs and the WMS interact with them • EDG’s Replica Location Service (RLS) • Catalogs in use in LCG-2 • Replica Metadata Catalog (RMC) + Local Replica Catalog (LRC) • Some performance problems detected during Data Challenges • New LCG File Catalog (LFC) • in production from LCG-2.6.0 release; • Coexistence with RLS; migration tools provided: http://goc.grid.sinica.edu.tw/gocwiki/How_to_migrate_the_RLS_entries_into_the_LCG_File_Catalog_%28LFC%29 • Accessible by defining: $LCG_CATALOG_TYPE=lfc and $LFC_HOST • Better performance and scalability • Provides new features: security, hierarchical namespace, transactions... ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  5. The RLS • RMC: • Stores LFN-GUID mappings • Accessible by edg-rmc CLI + API • RLS: • Stores GUID-SURL mappings • Accessible by edg-lrc CLI + API • Main weaknesses: • Insecure (anyone can delete catalog entries) • Bad performance (java clients…) DM RLS RMC RMC RLS ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  6. The LFC • One single catalog • LFN acts as main key in the database. It has: • Symbolic links to it (additional LFNs) • Unique Identifier (GUID) • System metadata • Information on replicas • One field of user metadata ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  7. The LFC (II) • Fixes EDG catalogs performance and scalability problems • Cursors for large queries • Timeouts and retries from the client • Provides more features than the EDG Catalogs • User exposed transaction API (+ auto rollback on failure) • Hierarchical namespace and namespace operations (for LFNs) • Integrated GSI Authentication + Authorization  Mapping with local UID/GID problem being solved (pool of accounts) • Access Control Lists (Unix Permissions and POSIX ACLs) • Checksums ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  8. Setting up the LFC client • Very simple installation (also included in YAIM): • Install single RPM: in WN, UI, RB • Specify the host of the server (required for the moment!) > export LFC_HOST=<LFC_server_hostname> • Test the client • Using lcg_utils and GFAL: • Define the catalog to use: $LCG_CATALOG_TYPE=lfc • Define the server hostname • The LFC server must be published in the BDII ($LCG_GFAL_INFOSYS) • Or use environmental variable: $LFC_HOST=<LFC_server_hostname> • Env variable: LFC_HOME • Can be set to use relative LFNs • LFC_HOME=/grid/gilda/myDir /grid/gilda/myDir/myFile becomes myFile ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  9. LFC Troubleshooting • Environment variables: • $LFC_HOST not set and catalog not published in BDII  lfc-ls…send2nsd: NS009 - fatal configuration error: Host unknown: … lcg-lr… return nothing (or “No such file or directory”) • $LCG_CATALOG_TYPE wrongly or not set (default “edg”) • Files that appear and disappear  lcg-lr… return nothing (or “No such file or directory”) • Unsupported VOs  lcg-lr… return “Invalid argument” (and “LRC, RMC endpoint not found”) • Other configuration errors • VO directory not defined by root in the LFC hierarchy • Unsupported VOs  lcg-lr… return “Invalid argument” (and “LRC, RMC endpoint not found”) • Attention! • lcg_utils do not create directories automatically (feature)  explicit use of lfc-mkdir required (as user)  $LFC_HOST must be set ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  10. LFC interfaces LFC SERVER LCG UTIL GFAL Python LFC CLIENT C API DLI WMS CLI lfc-ls, lfc-mkdir, lfc-setacl, … ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  11. LFC Interfaces (II) • LFC client commands • Provide administrative functionality • Unix-like • LFNs seen as a Unix filesystem (/grid/<VO>/ … ) • LFC C API • Alternative way to administer the catalog • Python wrapper provided • Integration with GFAL and lcg_util APIs complete  lcg-utils access the catalog in a transparent way • Integration with the WMS completed • The RB can locate Grid files: allows for data based match-making • Using the Data Location Interface (DLI) ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  12. Data Management CLIs & APIs • lcg_utils: lcg-* commands + lcg_* API calls • Provide (all) the functionality needed by the LCG user • Transparent interaction with file catalogs and storage interfaces when needed • Abstraction from technology of specific implementations • Grid File Access Library (GFAL): API • Adds file I/O and explicit catalog interaction functionality • Still provides the abstraction and transparency of lcg_utils • edg-gridftp tools: CLI • Complete the lcg_utils with low level GridFTP operations • Functionality available as API in GFAL • May be generalized as lcg-* commands ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  13. lcg-utils commands Replica Management File Catalog Interaction ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  14. LFC C API Low level methods (many POSIX-like): lfc_setacl lfc_setatime lfc_setcomment lfc_seterrbuf lfc_setfsize lfc_starttrans lfc_stat lfc_symlink lfc_umask lfc_undelete lfc_unlink lfc_utime send2lfc lfc_access lfc_aborttrans lfc_addreplica lfc_apiinit lfc_chclass lfc_chdir lfc_chmod lfc_chown lfc_closedir lfc_creat lfc_delcomment lfc_delete lfc_deleteclass lfc_delreplica lfc_endtrans lfc_enterclass lfc_errmsg lfc_getacl lfc_getcomment lfc_getcwd lfc_getpath lfc_lchown lfc_listclass lfc_listlinks lfc_listreplica lfc_lstat lfc_mkdir lfc_modifyclass lfc_opendir lfc_queryclass lfc_readdir lfc_readlink lfc_rename lfc_rewind lfc_rmdir lfc_selectsrvr ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  15. LFC commands Summary of the LFC Catalog commands ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  16. Practicals on LFC and lcg-utils Tony Calanducci ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  17. LFC Catalog commands Listing the entries of a LFC directory lfc-ls [-cdiLlRTu] [--comment] path… where pathspecifies the LFC pathname (mandatory) • Remember that LFC has a directory tree structure • /grid/<VO_name>/<you create it> • All members of a given VO have read-write permissions under their directory • -l (it is a lowercase “L”) outputs long listing • -R lists the contents of directories recursively (don’t use it so often!) • You can set LFC_HOME to use relative pathsLFC_HOME=/grid/gilda/myDir /grid/gilda/myDir/myFile becomes myFile Defined by the user LFC Namespace ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  18. Original File Symbolic link LFC Catalog commands Creating a symbolic link lfc-ln -s file linkname lfc-ln -s directory linkname Create a link to the specified fileor directory with linkname • Example: $ lfc-ln -s /grid/gilda/user.example /grid/gilda/trieste/linkToUser.ex Let’s check the link using lfc-ls with long listing (-l) $ lfc-ls -l /grid/gilda/trieste lrwxrwxrwx 1 4404 4400 0 Jul 17 12:06 linkToUser.ex -> /grid/gilda/user.example ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  19. LFC Catalog commands Creating directories in the LFC lfc-mkdir [-m mode] [-p] path... • Where pathspecifies the LFC pathname • Remember that while registering a new file (using lcg-cr, for example) the corresponding destination directory must be created in the catalog before • Examples: $ lfc-mkdir /grid/gilda/Examples You can just check the directory with: $ lfc-ls -l /grid/gilda ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  20. LFC Catalog commands Adding/deleting metadata information lfc-setcomment path comment lfc-delcomment path lfc-setcomment adds/replaces a comment associated with a file/directory in the LFC Catalog lfc-delcomment deletes a comment previously added • Example: lfc-setcomment /grid/gilda/user.example “Hello Trieste” • Check your job with.. lfc-ls --comment /grid/gilda/user.example lfc-ls --comment /grid/gilda/user.example /grid/gilda/user.example Hello Trieste ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  21. LFC Catalog commands • Example: lfc-delcomment /grid/gilda/user.example • Check your job with.. lfc-ls –l --comment /grid/gilda/user.example -rw-rw-r-- 1 4401 4400 0 Jun 21 09:38 /grid/gilda/user.example ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  22. Hands-on Session Exercise No.1: • Log onto an UI and initialize your proxy credentials if not already done • Check if environment variables are ok to to use lfc-gilda.ct.infn.it catalog • have a look inside the catalog • create a directory with your surname • put inside the just created dir a link to an existing file • add a comment to that file and verify it ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  23. LFC Catalog commands Summary of the LFC Catalog commands ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  24. lcg-utils • The LCG Data Management tools (usually called lcg-utils) allow users to copy files between UI, CE, WN and a SE, to register entries in the File Catalogs and replicate files between SEs. • Check if LCG_GFAL_INFOSYS environment variable is correctly set to the local GILDA Information Index (BDII) • export LCG_GFAL_INFOSYS=grid004.ct.infn.it:2170 ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  25. lcg-utils: lcg-cr Upload a file to a SE and register it into the catalog • lcg-cr -d dest_file | dest_host -l lfn [-g guid] [-l lfn] [-v | --verbose] --vo vo src_file where • dest_hostis the fully qualified hostname of the destination SE • dest_fileis a valid SURL (both sfn:// or srm:// format are valid) • guidspecifies the Grid Unique IDentifier. If this option is not present, a GUID is generated internally • lfnspecifies the Logical File Name associated with the file • vospecifies the Virtual Organization the user belongs to • src_filespecifies the source file name: the protocol can be file:/// or gsiftp:/// ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  26. lcg-utils: lcg-cr • To discover which SEs the user is allowed to use, remember you can use lcg-infosites command. lcg-infosites --vo gilda se The output is a list of SEs and related information on available/used space • lcg-cr usage example: $ lcg-cr -v -d grid-se.bio.dist.unige.it -l lfn:/grid/gilda/vico/note.txt --vo gilda file:///home/local/note.txt Using grid catalog type: lfc Source URL: file:///home/local/note.txt File size: 51 Destination specified: grid-se.bio.dist.unige.it Destination URL for copy: gsiftp://grid-se.bio.dist.unige.it/flatfiles/SE00/gilda/generated/2005-07-17/file1f0e73d8-7e3f-47d1-bc95-c03c92aae569 # streams: 1 Alias registered in Catalog: lfn:/grid/gilda/vico/note.txt Transfer took 11320 ms Destination URL registered in Catalog: sfn://grid-se.bio.dist.unige.it/flatfiles/SE00/gilda/generated/2005-07-17/file1f0e73d8-7e3f-47d1-bc95-c03c92aae569 guid:4c10a8e3-2244-4c38-bc98-ed98ae7cb94e ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  27. lcg-utils: lcg-aa and lcg-la Adding an alias for a given GUID lcg-aa --vo vo guid lfn where • vospecifies the Virtual Organization the user belongs to • guidspecifies the Grid Unique Identifier of the file you want to add the alias to • lfn specifies the new alias • Example: $ lcg-aa --vo gilda guid:4c10a8e3-2244-4c38-bc98-ed98ae7cb94e lfn:/grid/gilda/vico/aliasToNote.txt • To check if the previous command was successful, you can use lcg-la command to list the aliases for a given LFN, GUID or SURL $ lcg-la --vo gilda lfn:/grid/gilda/vico/aliasToNote.txt lfn:/grid/gilda/vico/note.txt lfn:/grid/gilda/vico/aliasToNote.txt ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  28. Hands-on session Exercise No.2: • verify that your LCG_GFAL_INFOSYS is correctly set up • create a dummy file • check the available storage elements • copy and register the previous created file into your previously created dir • add an alias to the just uploaded file • check if the alias was assigned correctly ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  29. lcg-utils commands for replicas (I) Copying a file from one SE to another one and register it in the Catalog lcg-rep -d dest_file | dest_host [-v | --verbose] --vo vo src_file where • dest_hostis the fully qualified hostname of the destination SE • dest_fileis a valid SURL (both sfn:// or srm:// are valid) • vospecifies the Virtual Organization the user belongs to • src_file specifies the source file name: the protocol can be LFN, GUID or SURL. An SURL scheme can be sfn: for a classical SE or srm: $ lcg-rep -v -d grid009.ct.infn.it --vo gilda lfn:/grid/gilda/vico/note.txt Using grid catalog type: lfc Source URL: lfn:/grid/gilda/vico/note.txt File size: 51 Destination specified: grid009.ct.infn.it Source URL for copy: gsiftp://grid-se.bio.dist.unige.it/flatfiles/SE00/gilda/generated/2005-07-17/file1f0e73d8-7e3f-47d1-bc95-c03c92aae569 Destination URL for copy: gsiftp://grid009.ct.infn.it/flatfiles/SE00/gilda/generated/2005-07-17/file4f3b4cb2-b5fe-467e-9a3e-1ef602465a17 # streams: 1 Transfer took 2410 ms Destination URL registered in LRC: sfn://grid009.ct.infn.it/flatfiles/SE00/gilda/generated/2005-07-17/file4f3b4cb2-b5fe-467e-9a3e-1ef602465a17 ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  30. lcg-utils commands for replicas (II) Listing of replicas for a given LFN, GUID or SURL lcg-lr --vo vo file where • vospecifies the Virtual Organization the user belongs to • file specifies the Logical File Name, the Grid Unique IDentifier or the Site URL. An SURL scheme can be sfn: for a classical SE or srm: • Example: $ lcg-lr --vo gilda lfn:/grid/gilda/vico/note.txt sfn://grid-se.bio.dist.unige.it/flatfiles/SE00/gilda/generated/2005-07-17/file1f0e73d8-7e3f-47d1-bc95-c03c92aae569 sfn://grid009.ct.infn.it/flatfiles/SE00/gilda/generated/2005-07-17/file4f3b4cb2-b5fe-467e-9a3e-1ef602465a17 or we got the same output using its GUID $ lcg-lr --vo gilda guid:4c10a8e3-2244-4c38-bc98-ed98ae7cb94e ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  31. lcg-utils commands for replicas (III) Deleting replicas • lcg-del [ -a ] [ -s se ] [ -v | --verbose ] --vo vo file where • ais usedto delete all replicas of the given file • sespecifies the SE from which you want to remove the replica • vospecifies the Virtual Organization the user belongs to • file specifies the Logical File Name, the Grid Unique IDentifier or the Site URL. An SURL scheme can be sfn: for a classical SE or srm:. Example: • delete one replica $ lcg-del --vo gilda -s grid009.ct.infn.it lfn:/grid/gilda/vico/note.txt • delete all the replicas $ lcg-del -a --vo gilda lfn:/grid/gilda/vico/note.txt • let’s check if the previous command was successful $ lcg-lr --vo gilda lfn:/grid/gilda/vico/note.txt lcg_lr: No such file or directory • or bylfs-ls /grid/gilda/vico (you will not see anymore note.txt and its alias) ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  32. lcg-utils: lcg-cp Downloading a Grid file in a SE to a local destination lcg-cp [ -v | --verbose ] --vo vo src_file dest_file where • vospecifies the Virtual Organization the user belongs to • src_file specifies the source file name: the protocol can be LFN, GUID, SURL or local file. An SURL scheme can be sfn: for a classical SE or srm: • dest_file specifies the destination. The protocol can be file:/// or gsiftp:/// Example: $ lcg-cp --vo gilda lfn:/grid/gilda/vico/note2.txt file:/home/local/note2.txt Source URL: lfn:/grid/gilda/vico/note2.txt File size: 51 Source URL for copy: gsiftp://gilda-se-01.pd.infn.it/shared/gilda/generated/2005-07-17/file06c3b28c-465f-489c-be3c-b68728e1ca16 Destination URL: file:/home/local/note2.txt # streams: 1 Transfer took 1060 ms ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  33. Hands-on session Exercise No.3: • Create two replicas of the file you previously uploaded (you could also use the alias to point it out) • Check if the operation was successful • Download the file back in your UI • Delete just one replica and verify that • Delete all the replicas and verify that • Verify if the entry is still into the catalog ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  34. Final exercise • GOAL: Submit a job that does data management: it will retrieve a file previously registered into the catalog. • Steps to follow up: • Create a new file in your UI and put some data into it • Choose a SE to upload the file to (hint: use lcg-infosites) and use the appropriate command to accomplish at this operation (lcg-cr –v –vo gilda –l lfn:/grid/gilda/vico/<choose an lfn> -d <an SE host> file:`pwd`/<your new file>) • create a script.sh file with the following content: #!/bin/sh /bin/hostname #Change the LFN_NAME to download from the Catalog. echo "Start to download.." lcg-cp --vo gilda lfn:/grid/gilda/vico/<lfn you choose> file:`pwd`/output.dat echo "Done.." ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  35. Final exercise (II) • Create the JobWithData.jdl: Type = "job"; JobType = "Normal"; Executable = "/bin/sh"; Arguments = "script.sh"; VirtualOrganisation = "gilda"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"script.sh"}; OutputSandbox = {"std.out","std.err","output.dat"}; • Submit it to the grid • Retrieve the output and verify the content of output.dat ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  36. Summary of lcg-utils commands Replica Management File Catalog Interaction ICTP/INFM-Democritos workshop on porting scientific application on computational grids

  37. Bibliography • Information on the file catalogs • LFC, gfal, lcg-utils: “Evolution of LCG-2 Data Management (J-P Baud, J. Casey)” http://indico.cern.ch/contributionDisplay.py?contribId=278&sessionId=7&confId=0 • LFC installation, administration, migration from RLS: • Wiki entries indicated through the presentation: • http://goc.grid.sinica.edu.tw/gocwiki/How_to_set_up_an_LFC_service • http://goc.grid.sinica.edu.tw/gocwiki/How_to_migrate_the_RLS_entries_into_the_LCG_File_Catalog_%28LFC%29 • LFC contacts: • Jean-Philippe.Baud@cern.ch • Sophie.Lemaitre@cern.ch ICTP/INFM-Democritos workshop on porting scientific application on computational grids

More Related