1 / 70

gLite in Practice

gLite in Practice . Gorgi Kakasevski, gorgik@feit.ukim.edu.mk Ivica Dimitrovski, ivicad@feit.ukim.edu.mk Faculty of Electrical Engineering and Information Technology 12 February 2008 Skopje, Macedonia. Overview. tutorial - gLite with CLI (Command Line Interface) Grid security

micheal
Download Presentation

gLite in Practice

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. gLite in Practice Gorgi Kakasevski, gorgik@feit.ukim.edu.mk Ivica Dimitrovski,ivicad@feit.ukim.edu.mk Faculty of Electrical Engineering and Information Technology 12 February 2008 Skopje, Macedonia

  2. Overview • tutorial - gLite with CLI (Command Line Interface) • Grid security • Job submission • Data Management • Grid Information System

  3. Grid security (1) • Each grid user needs valid credentials to use the grid • gLite uses X.509 certificates as credentials • Certificates are issued by a Certification Authority (CA) • X.509 credentials consist of • The certificate (public key) stored as ~/.globus/usercert.pem • The private key stored as ~/.globus/userkey.pem • The private key is protected by a passphrase • To improve security, proxy certificates are used • The private key is NOT protected by a passphrase • A proxy certificate has a limited lifetime (default: 12 hours) • Proxy certificates are stored together with their private key

  4. Grid security (2) • MyProxy credential repository • MyProxy servers store proxy certificates • The user first uploads the certificate to the server • A proxy can then be retrieved by the user when needed • Used for • Automatic proxy renewal for long-running jobs • Grid portals (e.g. GILDA, P-Grade Portal)

  5. Myproxy-init Command line MyProxy server Retrieve proxy Web browser Grid portal Login Submit job Computing element Grid security (3) Proxy renewal

  6. Grid security (4) • In the following exercise you will: • Create a proxy certificate • Upload the proxy to the MyProxy server

  7. Exercise 1: Grid security (0) • Login to your local workstation • User name: ... • Password: ...

  8. Exercise 1: Grid security (1) • Initialize your grid proxy certificate • Login to using ssh (putty) > ssh gridXX@grid-ui.feit.ukim.edu.mk • Substitute XX by your account number (01 to 25) • Use the password "GridXX" • Create a proxy certificate > voms-proxy-init --voms sgdemo • Provide the passphrase „keyforcert" to decrypt the user key • Verify the proxy certificate > voms-proxy-info -all • After the practicals, delete your temporary proxy > voms-proxy-destroy If successful the output will be "Your proxy is valid until…“ This will show you subject, issuer, etc. of your local proxy certificate

  9. Exercise 1: Grid security (2) • Upload your proxy certificate to the MyProxy server: • Initialize and upload your MyProxy certificate > myproxy-init -s grid001.ct.infn.it –o gilda –c 120 • Provide the passphrase „keyforcert" to decrypt the user key • Choose a passphrase for the proxy certificate on the server • Verify the MyProxy certificate on the server > myproxy-info -s grid001.ct.infn.it • After the practicals, remove your proxy certificate from the server > myproxy-destroy -s grid001.ct.infn.it If successful the output will be "A proxy valid for .. hours…“ Shows you details about your proxy on the MyProxy server

  10. Workload Management System • Workload Management System (WMS) comprises a set of Grid middleware components responsible for distribution and management of tasks across Grid resources • applications are conveniently, efficiently and effectively executed. • Comparable services from other grid projects are, among others, the EDG WMS, Condor and the Eurogrid-Unicore resource broker.

  11. Workload Management System (2)

  12. Workload Management System (3) Job management requests (submission, cancellation) expressed via the Job Description Language (JDL)

  13. Workload Management System (4) Keeps submission requests Requests are kept for a while if no matching resources available

  14. Workload Management System (5) Repository of resource information available to matchmaker Updated via notifications and/or active polling on sources

  15. Workload Management System (6) Finds an appropriate CE for each submission request, taking into account job requests and preferences, Grid status, utilization policies on resources

  16. Workload Management System (7) Performs the actual job submission and monitoring

  17. Job State Machine Overview of EGEE

  18. Job State Machine (1/9) Submitted: job is entered by the user to the User Interface but not yet transferred to Network Server for processing Overview of EGEE

  19. Job State Machine (2/9) Waiting: job was accepted by NS and is waiting for Workload Manager processing or being processedby WMHelper modules. Overview of EGEE

  20. Job State Machine (3/9) Ready: job processed by WM and its Helper modules (CE found) but not yet transferred to the CE (local batch system queue) via JC and CondorC.. Overview of EGEE

  21. Job State Machine (4/9) Scheduled:job waiting in the queue on the CE. Overview of EGEE

  22. Job State Machine (5/9) Running:job is running on CE’s queuing system (inside one of the worker nodes) Overview of EGEE

  23. Job State Machine (6/9) Done:job exited or considered to be in a terminal state by CondorC (e.g., submission to CE has failed in an unrecoverable way). Overview of EGEE

  24. Job State Machine (7/9) Aborted: job processing was aborted by WMS (waiting in the WM queue or CE for too long, over-use of quotas, expiration of user credentials). Overview of EGEE

  25. Job State Machine (8/9) Cancelled:job has been successfully canceled on user request. Overview of EGEE

  26. Job State Machine (9/9) Cleared:output sandbox was transferred to the user or removed due to the timeout. Overview of EGEE

  27. Job submission • In the following exercise you will: • Create a program to be run in the grid (shell script) • Create a job description using the JDL • Submit the job to the grid • Inspect the output of the job

  28. Exercise 3: Job submission (1) • Create a file named ‘myhelloworld.sh’. This will be the job to be run in the grid. #!/bin/sh MYNAME="Your Name" WORKER_NODE=$(hostname) echo "Hello ${MYNAME}" echo "Greetings from ${WORKER_NODE}!"

  29. Exercise 3: Job submission (2) • Create a JDL file named ‘myhelloworld.jdl’ [ Executable="myhelloworld.sh"; StdOutput="std.out"; StdError="std.err"; VirtualOrganisation=„sgdemo"; InputSandbox={"myhelloworld.sh"}; OutputSandbox={"std.out", "std.err"}; RetryCount=10; ]

  30. Exercise 3: Job submission (3) Finding available resources • Using the command line interface: • Login to your account using ssh. • Create .sh and .jdl files • Issue the following command: • > glite-wms-job-list-match -a <JDL-file> • > glite-wms-job-list-match -o list.txt -a myhelloworld.jdl

  31. Exercise 3: Job submission (4) Submitting the job • Submit the job using the command line interface by > glite-wms-job-submit -i list.txt -a myhelloworld.jdl > glite-wms-job-submit -i list.txt -o jobs.txt -a myhelloworld.jdl The glite-job-submit command returns the job identifier which is going to be used in subsequent slides. Example: https://wms.phy.bg.ac.yu:9000/vzTIBW1j7I-PmWjOJ8Swww

  32. Exercise 3: Job submission (5) • Checking the job status • Query the job status • The "Status" changes from "Ready" to "Scheduled" to "Running", and eventually to "Done„ • It is possible to cancel the job while it is not done. • Commands on the command line. > glite-wms-job-status https://wms.phy.bg.ac.yu:9000/vzTIBW1j7I-PmWjOJ8Swww > glite-wms-job-status -i jobs.txt > glite-wms-job-cancel -i jobs.txt

  33. Exercise 3: Job submission (6) Retrieving the job output • On the command line you can do this with > glite-wms-job-output -i jobs.txt • The output is stored in the /tmp/glite/glite-ui/ directory. The exact location is given to you as the command returns. • Go there and inspect the files.

  34. Submitting your own program • In the following exercise you will: • Write and compile a C++ program to be run in the grid • Create a JDL file specifying the requirements for that job • Submit the job to be run in the grid • Inspect the output of the job

  35. Exercise 4: Submitting your own Program (1) • Create a file "myprogram.cpp" with the following content: #include <iostream> int main() { std::cout << "Hello World!" << std::endl; } • Compile and run > g++ -o myprogram myprogram.cpp

  36. Exercise 4: Submitting your own Program (2) • Create the JDL file ‚myprogram.jdl‘ with the following content: [ Executable="myprogram"; StdOutput="std.out"; StdError="std.err"; InputSandbox={"myprogram"}; OutputSandbox={"std.out","std.err"}; RetryCount=10; ] • Submit it to the Grid • Check the status • Fetch and inspect the output

  37. Data Catalogues Data Storage I/O Store SE Data Movement Scheduler Queue User Interface Data Management Services • Storage Element • Storage Resource Manager • POSIX-I/O • Accessprotocols • Catalogues • File Catalogue • Replica Catalogue • File Authorization Service • Metadata Catalogue • File Transfer • Data Scheduler (not implemented yet) • File Transfer Service • File Placement Services • User Interface

  38. Data Management Services • Each file has a unique identifier • Files/directories are organized on a Catalogue • Similar to a filesystem (Logical File Name) • There is one Catalogue per VO • The data can be stored on several Storage Elements (SE) • The Catalogue hides the actual location Catalogue Logical File Name LFN : /grid/gilda/hagenberg/file.txt Storage Resource Manager srm://trigrid-ce01.unime.it/dpm/unime.it/home/gilda/generated/ 2006-09-20/filef026441a-5834-431f-b28d-06cb7e4c784f Physical Filename /home/gilda/generated/2006-09-20/filef026441a-5834-431f-b28d- 06cb7e4c784f SE SE SE SE SE

  39. Replica Catalog • Keeps information at a site • (Meta Data Catalog) • Attributes of files on the logical level • Boundary between generic middleware and application layer Metadata Catalog Metadata GUID Site ID LFN File Catalog Site ID Replica Catalog Site A Replica Catalog Site B LFN LFN GUID SURL GUID SURL SURL SURL gLite Catalogs • File Catalog • Filesystem-like view on logical file names • Keeps track of sites where data is stored • Conflict resolution

  40. Control SRM interface POSIXAPI File I/O User rfio dcap chirp aio dCache DPM Castor Disk Storage Element Interfaces • SRM interface • Management and control • SRM (with possible evolution) • Posix-like File I/O • File Access • Open, read, write • Not real posix (like rfio)

  41. Data management • GUID - Grid Unique Identifier • guid:bbffd56a-eff0-4b59-af7b-2a235986001c • LFN - Logical File Name • lfn:TestFile.dat • SURL - Storage URL • sfn://grid-se.etf.ukim.edu.mk/grid/sgdemo/data.dat • TURL - Transport URL • gsiftp://grid-se.etf.ukim.edu.mk/grid/sgdemo/data.dat

  42. Data management • In the following exercise you will: • Browse the LFC catalogue • Create and remove directories in the catalogue • Publish files in the catalogue

  43. Exercise 5: Data management (1) • Browse the file catalog using > lfc-ls -l /grid/sgdemo • Create a new directory in the catalog > lfc-mkdir /grid/sgdemo/feitXX • Explore the properties of the newly created directory > lfc-ls –ld /grid/sgdemo/feitXX > lfc-getacl /grid/sgdemo/feitXX • Some more commands: lfc-chown, lfc-ln, lfc-rename, lfc-rm

  44. Exercise 5: Data management (2) • Store a local file on the storage element and assign it a LFN (Logical File Name) in the catalogue > lcg-cr --vo sgdemo -l /grid/sgdemo/feitXX/myhelloworld.jdl file:/home/gorgik/tmpgrid/myhelloworld.jdl • Browse the file catalog again > lfc-ls -l /grid/sgdemo/feitXX • Retrieve the file from the storage element > lcg-cp --vo sgdemo lfn:/grid/sgdemo/feitXX/myhelloworld.jdl \ file://$HOME/myhelloworld.jdl.bak • Remove it from the storage element and the catalogue > lcg-del -a --vo sgdemo lfn:/grid/sgdemo/feitXX/myhelloworld.jdl • Remove the directory from the catalog > lfc-rm -r /grid/sgdemo/feitXX • Browse the file catalog > lfc-ls -l /grid/sgdemo/feitXX

  45. Job Description Language (1) • The JDL is used to define • Job characteristics • Job requirements • Data requirements • Complete JDL reference • http://glite.web.cern.ch/glite/documentation/

  46. Job Description Language (2) • The supported attributes are grouped into two categories: • Job Attributes • Define the job itself • Resources • Taken into account by the Workload Manager for carrying out the matchmaking algorithm (to choose the “best” resource where to submit the job) • Computing Resource • Used to build expressions of Requirements and/or Rank attributes by the user • Have to be prefixed with “other.” • Data and Storage resources • Input data to process, Storage Element (SE) where to store output data, protocols spoken by application when accessing SEs

  47. Relevant Attributes (1) • JobType (mandatory) • Normal (simple, sequential job), DAG, Interactive, MPICH, Checkpointable • Executable (mandatory) • The command name • Arguments (optional) • Job command line arguments • StdInput, StdOutput, StdError (optional) • Standard input/output/error of the job • Environment (optional) • List of environment settings

  48. Relevant Attributes (2) • InputSandbox (optional) • List of files on the UI’s local disk needed by the job for running • The listed files will be staged to the remote resource automatically • OutputSandbox (optional) • List of files, generated by the job, which have to be retrieved • VirtualOrganisation (mandatory) • The virtual organisation the user submitting the job is working for • Can be omitted if preconfigured on the UI or in the VOMS proxy

  49. Relevant Attributes (3) • Requirements (optional) • Job requirements on computing resources • Specified using attributes of resources published in the Information Service • Rank (optional) • Expresses preference (how to rank resources that have already met the Requirements expression) • Specified using attributes of resources published in the Information Service • If not specified, default value defined in the UI configuration file is considered • Default: other.GlueCEStateEstimatedResponseTime (the lowest estimated traversal time) • Default: other.GlueCEStateFreeCPUs (the highest number of free CPUs) for parallel jobs (see later)

  50. Relevant Attributes (4) • InputData (optional) • Refers to data used as input by the job: these data are published in the Replica Catalog and stored in the Storage Elements • LFNs and/or GUIDs • DataAccessProtocol (mandatory if InputData has been specified) • The protocol or the list of protocols which the application is able to speak with for accessing InputData on a given Storage Element • OutputSE (optional) • The Uniform Resource Identifier of the output Storage Element • RB uses it to choose a Computing Element that is compatible with the job and is close to the Storage Element

More Related