150 likes | 264 Views
This overview discusses the role of iRODS (Integrated Rule-Oriented Data System) in enhancing data management and access for various fields, including nuclear physics, astrophysics, and biomedical research. It highlights its advantages, such as resource monitoring, virtualization of data management policies, and integration capabilities with existing storage systems. The workshop addresses the collaborative use of iRODS across international research efforts and demonstrates the approach through specific examples in arts, humanities, and scientific applications, paving the way for future utilization.
E N D
Genericpolicyrules and principles Jean-Yves Nief
Talk overview • An introduction to CC-IN2P3 activity. • iRODS in production: • Why are weusingit ? • Whoisusingit ? • Prospects. • iRODSrulespoliciesthroughexamples: • Resource Monitoring System. • Biomedical applications: • Human data. • Animal data. • Arts and Humanities. • Otherrules: Mass storage system interface, accessrights. • Pitfalls. • Future usages. Repository workshop - Garching
dapnia CC-IN2P3 activities • Federatecomputingneeds of the french scientificcommunity in: • Nuclear and particlephysics. • Astrophysics and astroparticles. • Computing services to international collaborations: • - CERN (LHC), Fermilab, SLAC, …. • Openednow to biology, Arts & Humanities. Repository workshop - Garching
iRODS @ CC-IN2P3: whyusingit ? • National and international collaborations. • Usersspreadgeographically (Europe, America, Australia…). • Needfor storagevirtualization: • federation of heterogeneousstorage (disks, tapes) and data access system (MSS, databases…). • transparent data access for end users. • middleware working on heterogeneous OS. • commonlogicalnamespace. • virtualorganization (accessrights, groups etc…). • metadatasearch. • Easy interface withanykind of clients applications (APIs, drivers). Repository workshop - Garching
iRODS @ CC-IN2P3: whyusingit ? • SRB beingusedsince 2003: • 3 PBshandled for 10 differentexperiments (HEP, astro, biology). • Decomissionning: end of 2012 ? • Limitation: • no centralized data management (DM). • no enforcement of DM policy. • iRODSrulesbasedpolicy: • adequate solution. • from the user point of view: virtualization of data management policy. Repository workshop - Garching
iRODS @ CC-IN2P3: whoisusingit ? • Arts and Humanities (Adonis): • Long term data preservation. • Web and batch jobs access. • Biology (phylogenetic), fluidmechanics: • grid jobs. • Biomedical applications: • Human and animal imagery. • Biology (phylogenetic), fluidmechanics: • grid jobs. • High Energyphysics: • Neutrino experiment. Repository workshop - Garching
iRODS @ CC-IN2P3: whoisgoing to use it ? • Astrophysicsexperiments: • LSST … • Otherbiomedical, physicsprojects. • iRODSwillbe part of French NGI. • All the SRB instances to bemoved to iRODS. 1 PB shouldbereachedsoon. Repository workshop - Garching
Rulesexamples: Arts and Humanities • Ex: archival and data publication of audio files (CRDO). Data transfer: CRDO CINES (Montpellier). Archivedat CINES. iRODStransfer to CC-IN2P3: iput file.tar Automaticuntarat Lyon + checksum. Automatic registration in Fedora-commons(delayedrule). CRDO CC-IN2P3 Fedora CINES Archive Repository workshop - Garching
Rulesexamples: biomedical data • Human and animal data (fMRI, PET, MEG etc…). • Usually in DICOM format. • Main issue for human data: • Need to beanonymized ! • Need to do metadatasearch on DICOM files. • Rule: • Check for anonymization of the file: send a warning if not true. • Extract a subset of metadata (based on a liststored in iRODS) from DICOM files. • Addthesemetadata as user definedmetadata in iRODS. Repository workshop - Garching
iRODS data server Perf script Perf script Perf script Perf script iRODS data server iRODS data server DB iRODS data server Rulesexamples: resource monitoring system • Ask each server for its metrics: rule engine cron task (msi). 2. Performance script launched on each server. iRODS iCAT server 3. Results sent back to the iCAT. 4. Store metrics into iCAT. 5. Compute a «quality factor» for each server stored in an other table: r.e. cron task (msi). Repository workshop - Garching
Otherrules • Mass Storage System integration: • Using compound resources: iRODSdisk cache + tapes. • Data on disk cache replicationinto MSS asynchronously (1h later) using a delayExecrule. • Recoverymechanism: retries untilsuccess, delaybetweeneach retries isdoubledateach round. • ACL management: • Rulesneeded for fine granularityaccessrights management. • Eg: • 3 groups of users (admins, experts, users). • ACLs on /<zone-name>/*/rawdata => admins : r/w, experts + users : r • ACLs on all otherssubcollections => admins + experts : r/w, users : r Repository workshop - Garching
Developpementsneeded • Scripts/binaries: • Metadata extraction from DICOM files. • Registration of files intoFedora-Commons. • … Neededwhateverstorage system beingusedunderneath. • Micro-services: • ACLs, tar/untar of archives file,… • APIs alreadyavailable, did not require a large amount of work (parts of iRODSdistro). • Resource Monitoring System: biggerdeveloppement, includes modification of the iCATschema. • Rules: • Most of them are simple. • Somes requires more work (Adonis project), workflow more complex. Repository workshop - Garching
Pitfalls and bugs • Writingcomplexrules: • Avoidwritingthemdirectlyusing the .irbsyntax. • Becomesdifficult to debugespeciallywithnested actions. • solution: need to use ruleGen to generaterules in a more user friendlymanner. • SomememoryleaksfoundwithirodsReServerwith Oracle as a backend: Fixed in 2.4. • delayExecsyntax bugs: • Fixed in 2.4 and 2.4.1. • Rules in configuration file at the moment: • Must be consistent on all the iRODS servers. Will be in the iCATdatabase in the future. Repository workshop - Garching
Prospects • Rules for database interaction (in progress): • Will beused by DTM (developpedat CC-IN2P3): • DTM managedlist of tasks to beprocessed by a batch cluster. • DTM requires a database to manage the tasks. • Rulelaunched by the client willinteractwith the DTM databasethroughiRODS: • More security: iRODSused as a proxy server (databasebehind a firewall, use iRODSauthentication. • Databaseschema upgrade transparent for the client (no SQL code launched on the client side). • Xmessaging system (part of iRODS): • Allow to exchange messages betweendifferentiRODSprocess or clients. • e.g.: Couldbeused to monitor job status in a distributedcomputing environnement. Repository workshop - Garching
Acknowledgement • Thanks to: • Pascal Calvat. • YonnyCardenas. • Thomas Kachelhoffer. • Pierre-Yves Jallud. iRODS at CC-IN2P3