1 / 13

DQ2 discussion on future features

DQ2 discussion on future features. BNL workshop October 4, 2007. DQ2 0.4.x. Continue to optimize DB schema to cope with higher load channel allocation to follow ‘Dataset Subscription policy’ Hiro/Patrick also asking for local configurable ordered list of preferred sources within cloud

yosef
Download Presentation

DQ2 discussion on future features

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DQ2 discussionon future features BNL workshop October 4, 2007

  2. DQ2 0.4.x • Continue to optimize DB schema to cope with higher load • channel allocation to follow ‘Dataset Subscription policy’ • Hiro/Patrick also asking for local configurable ordered list of preferred sources within cloud • implications on channel allocation • How much to ‘prefer’ a T1 before going to a T2 for a replica? Right now, shortest queue wins… • distinguishing files unlikely to have replicas in the future (bad subscriptions) • particularly in the local monitoring • removing ‘holes’ in system (growing backlogs) • Reduce load (better GSI session reuse) • Goal O(100K) file transfers/day/site • or SRM/storage limitations • Need better understanding outside DQ2

  3. Local monitoring of site services

  4. Staging… • Did not recognize this was a problem for OSG • .. It is very hard to do with remote storages without SRM • FTS 2 + SRMv2 move on the right direction but not there yet • Could do a local mechanism for T1->T2 transfers in the same cloud • provided site services for T2 run “close” to the T1 storage • … but not for cross T1 transfers

  5. Hierarchiescurrent thoughts, for discussion • Hierarchical datasets would be a special kind of dataset. • These would have only 2 states: open AND frozen • These would not have versions • The constituents of a hierarchical dataset could only be closed dataset versions or frozen datasets • Not sure if the following commands should be provided explicitly: • list files in hierarchical dataset directly? • or only list datasets in hierarchical dataset and forcing user to loop over results? • subscribe open hierarchical dataset? • or only allow listing datasets in open hierarchical dataset and forcing user to manually subscribe sub-units • point is: having to loop over OPEN hierarchies (likely manageable) • locations of hierarchical dataset? • or only allow listing locations of the individual datasets in the hierarchical dataset?

  6. Merging • Not much to do from DQ2 side here but provide an attribute for each dataset • “merged” Y/N (or protocol: zip, tar?) • DQ2 does 3rd party transfers only • does not actually ‘see’ the data

  7. Checksums • Not much from DQ2 here but enforcing checksums in the central catalogues and its protocol • ‘md5:’ for MD5 • adler32 is frequently discussed as a better checksum candidate • but not relevant to DQ2, rather to the sites and production people

  8. Subscription lifetime • Increasingly important… • Would clean up what no one is cleaning up now… (some sites with O(100K) files in impossible situations) • Discussion from yesterday: • allow only waitForSources to be set by users with production role ? • avoid creating looping subscriptions in the system • Forbid subscriptions for datasets with more than X files, if not production user requesting? • Forbid more than Y subscriptions per sure, if not production user? • Ignore subscription - regardless of its state - after more than 3 months? • Subscription is marked as broken

  9. Central catalogues • [ as mentioned yesterday ] • Main changes are: • for Scalability only… • dropping VUIDs (becomes DUID+Version number) • DUID becomes timestamp-oriented UUID so that backend is partitioned in time • and highly optimized UUID storage on ORACLE • meaning shorter index • ORACLE partitioning, redirect service… • .. but fully backward compatible with 0.3 clients • Many queries become much faster • list files in dataset is query by DUID as opposed to query by N number of VUIDs • ORACLE IOTs guarantees listing files from a dataset [version] reads close to sequential blocks on disk

  10. Location catalogue • [ as mentioned yesterday ] • Location catalogue will be populated asynchronously with: • information on missing files • (re)marking complete/incomplete locations for existing datasets - consistency • Missing files are extra information made available on ‘best-effort’ to the users • derived from request by Ganga • This is populated by the ‘tracker’ service • Which was being reworked for the site services • The tracker service is a ‘stronger’ Fetcher (as existing on the site services), used to find content on site VS content missing on site - one of the site services performance bottleneck

  11. Dashboard • Relatively big update coming soon • distinguish errors source/destination • display messages on the dashboard for all sites • alarms supported • more overview of site services state from a central place • e.g. states of files (based also on new site services monitoring)

  12. ToA • More and more info there… • Blacklist/whitelist • Preferred site connections • This is a cache file, same style as ToA • but independent file from ToA cache since it is more dynamic • ToA renewal much stronger • I’d claim it is the most reliable info system so far on the Grid :-)

  13. Communication… • … still not working: • e.g. did not recognize staging as a problem • e.g. 0.3.2 apparently not deployed on OSG T2s • quite bad as 0.3.1 had a simple bug where agents could simply die whenever a glitch happened in the central catalogue connection • glitches “common” with the central catalogue request rate, but harmless and ok to retry • … what to do here? • Jabber chatroom :-) • ddmdev@conference.jabber.org • ask me - msbranco@gmail.com or atlas-dq2-dev@cern.ch - to be authorized

More Related