1 / 15

Generic policy rules and principles

Generic policy rules and principles. Jean-Yves Nief. Talk overview. An introduction to CC-IN2P3 activity . iRODS in production: Why are we using it ? Who is using it ? Prospects. iRODS rules policies through examples : Resource Monitoring System. Biomedical applications:

pabla
Download Presentation

Generic policy rules and principles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genericpolicyrules and principles Jean-Yves Nief

  2. Talk overview • An introduction to CC-IN2P3 activity. • iRODS in production: • Why are weusingit ? • Whoisusingit ? • Prospects. • iRODSrulespoliciesthroughexamples: • Resource Monitoring System. • Biomedical applications: • Human data. • Animal data. • Arts and Humanities. • Otherrules: Mass storage system interface, accessrights. • Pitfalls. • Future usages. Repository workshop - Garching

  3. dapnia CC-IN2P3 activities • Federatecomputingneeds of the french scientificcommunity in: • Nuclear and particlephysics. • Astrophysics and astroparticles. • Computing services to international collaborations: • - CERN (LHC), Fermilab, SLAC, …. • Openednow to biology, Arts & Humanities. Repository workshop - Garching

  4. iRODS @ CC-IN2P3: whyusingit ? • National and international collaborations. • Usersspreadgeographically (Europe, America, Australia…). • Needfor storagevirtualization: • federation of heterogeneousstorage (disks, tapes) and data access system (MSS, databases…). • transparent data access for end users. • middleware working on heterogeneous OS. • commonlogicalnamespace. • virtualorganization (accessrights, groups etc…). • metadatasearch. • Easy interface withanykind of clients applications (APIs, drivers). Repository workshop - Garching

  5. iRODS @ CC-IN2P3: whyusingit ? • SRB beingusedsince 2003: • 3 PBshandled for 10 differentexperiments (HEP, astro, biology). • Decomissionning: end of 2012 ? • Limitation: • no centralized data management (DM). • no enforcement of DM policy. • iRODSrulesbasedpolicy: • adequate solution. • from the user point of view: virtualization of data management policy. Repository workshop - Garching

  6. iRODS @ CC-IN2P3: whoisusingit ? • Arts and Humanities (Adonis): • Long term data preservation. • Web and batch jobs access. • Biology (phylogenetic), fluidmechanics: • grid jobs. • Biomedical applications: • Human and animal imagery. • Biology (phylogenetic), fluidmechanics: • grid jobs. • High Energyphysics: • Neutrino experiment. Repository workshop - Garching

  7. iRODS @ CC-IN2P3: whoisgoing to use it ? • Astrophysicsexperiments: • LSST … • Otherbiomedical, physicsprojects. • iRODSwillbe part of French NGI. • All the SRB instances to bemoved to iRODS.  1 PB shouldbereachedsoon. Repository workshop - Garching

  8. Rulesexamples: Arts and Humanities • Ex: archival and data publication of audio files (CRDO). Data transfer: CRDO  CINES (Montpellier). Archivedat CINES. iRODStransfer to CC-IN2P3: iput file.tar Automaticuntarat Lyon + checksum. Automatic registration in Fedora-commons(delayedrule). CRDO CC-IN2P3 Fedora CINES Archive Repository workshop - Garching

  9. Rulesexamples: biomedical data • Human and animal data (fMRI, PET, MEG etc…). • Usually in DICOM format. • Main issue for human data: • Need to beanonymized ! • Need to do metadatasearch on DICOM files. • Rule: • Check for anonymization of the file: send a warning if not true. • Extract a subset of metadata (based on a liststored in iRODS) from DICOM files. • Addthesemetadata as user definedmetadata in iRODS. Repository workshop - Garching

  10. iRODS data server Perf script Perf script Perf script Perf script iRODS data server iRODS data server DB iRODS data server Rulesexamples: resource monitoring system • Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. iRODS iCAT server 3. Results sent back to the iCAT. 4. Store metrics into iCAT. 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi). Repository workshop - Garching

  11. Otherrules • Mass Storage System integration: • Using compound resources: iRODSdisk cache + tapes. • Data on disk cache replicationinto MSS asynchronously (1h later) using a delayExecrule. • Recoverymechanism: retries untilsuccess, delaybetweeneach retries isdoubledateach round. • ACL management: • Rulesneeded for fine granularityaccessrights management. • Eg: • 3 groups of users (admins, experts, users). • ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r • ACLs on all otherssubcollections => admins + experts : r/w, users : r Repository workshop - Garching

  12. Developpementsneeded • Scripts/binaries: • Metadata extraction from DICOM files. • Registration of files intoFedora-Commons. • …  Neededwhateverstorage system beingusedunderneath. • Micro-services: • ACLs, tar/untar of archives file,… • APIs alreadyavailable, did not require a large amount of work (parts of iRODSdistro). • Resource Monitoring System: biggerdeveloppement, includes modification of the iCATschema. • Rules: • Most of them are simple. • Somes requires more work (Adonis project), workflow more complex. Repository workshop - Garching

  13. Pitfalls and bugs • Writingcomplexrules: • Avoidwritingthemdirectlyusing the .irbsyntax. • Becomesdifficult to debugespeciallywithnested actions. • solution: need to use ruleGen to generaterules in a more user friendlymanner. • SomememoryleaksfoundwithirodsReServerwith Oracle as a backend:  Fixed in 2.4. • delayExecsyntax bugs: • Fixed in 2.4 and 2.4.1. • Rules in configuration file at the moment: • Must be consistent on all the iRODS servers.  Will be in the iCATdatabase in the future. Repository workshop - Garching

  14. Prospects • Rules for database interaction (in progress): • Will beused by DTM (developpedat CC-IN2P3): • DTM managedlist of tasks to beprocessed by a batch cluster. • DTM requires a database to manage the tasks. • Rulelaunched by the client willinteractwith the DTM databasethroughiRODS: • More security: iRODSused as a proxy server (databasebehind a firewall, use iRODSauthentication. • Databaseschema upgrade transparent for the client (no SQL code launched on the client side). • Xmessaging system (part of iRODS): • Allow to exchange messages betweendifferentiRODSprocess or clients. • e.g.: Couldbeused to monitor job status in a distributedcomputing environnement. Repository workshop - Garching

  15. Acknowledgement • Thanks to: • Pascal Calvat. • YonnyCardenas. • Thomas Kachelhoffer. • Pierre-Yves Jallud. iRODS at CC-IN2P3

More Related