70 likes | 161 Views
Update on FAX Redirection Topology Detection and Verification by Wei Yang. Redundant redirectors established for EU, UK, DE, and FR regions with plans for more. Detailed redirector topology, verification methods, and handling abnormal situations.
E N D
Update on FAX Redirection TopologyDetection & Verification Wei Yang
Redirector hardware at CERN • Redundant redirectors for EU, UK, DE, FR • Redundant (the “+” sign below) VMs • More to come • atlas-xrd-eu.cern.ch+ • xrootd port 1094, cmsd port 1098 • atlas-xrd-uk.cern.ch+ • Report to EU redirector • Xrootd port 1094,cmsd port 1098? • Same for DE and FR redirectors
Redirector topology cmsd & xrootd redirection EU rdr US rdr Mature xrootd redirection DE rdr UK rdr US rdr SLAC test machine Glasgow rdr Edinburgh (ECDF) rdr Middle West rdr Site 1 rdr Site 2 rdr Site A rdr Site B rdr • cmsd based redirection search the branch under it • xrootd based redirection is used to jump to upper level • if cmsd search return nothing • US can either report to EU redirector, or as a peer of EU • depend on needs, latency, or performance
Topology verification Goal: diagnose broken node in the topology • Deploy site specific (small) file with known checksum • Access from global redirector • Test full redirection chain + N2N • For every lower level redirector • Test a file not exists in its domain How to find out that the actual topology is? • Weknow how do this manually • We need to do it automatically and produce a graph
Testing Topology with Xrdcp $ xrdcp -f-d1 root://atlas-xrd-eu.cern.ch//atlas/dq2/user/wbhimji/HCtest/user.ilijav.HCtest.1/group.test.hc.NTUP_SMWZ.root /dev/null This is a unique file at Glasgow • Received redirection to [xrdfed01.cern.ch:1094]. A.K.A. atlas-xrd.uk.cern.chUK redirector at CERN • Received redirection to [svr025.gla.scotgrid.ac.uk:11000]. • Received redirection to [disk034.gla.scotgrid.ac.uk:1095]. EU rdr -> UK rdr -> Glasgow rdr-> Glasgow data server
Testing Bottom Up Redirection Topology: EU rdr -> UK rdr -> Glasgow rdr-> Glasgow data server $ xrdcp -f-d 1root://svr025.gla.scotgrid.ac.uk:11000//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.wt2/user.HironoriIto.xrootd.wt2-1M /dev/null A unique file at SLAC • Received redirection to [atlas-xrd-uk.cern.ch:1094]. Token=[]]. Opaque=[tried=+fedredir_atlas@svr025.gla.scotgrid.ac.uk]. (tried : already tried myself, please exclude me) • Received redirection to [atlas-xrd-eu.cern.ch:1094]. Token=[]]. Opaque=[tried=+1098localhost]. (small mis-configure) • Received redirection to [atl-prod08.slac.stanford.edu:1094]. Token=[]]. Opaque=[]. RdrSeq: Glasgow rdr -> UK rdr -> EU rdr-> SLAC data server
Dealing with Abnormal Situation $ xrdcp-f-d 1root://atlas-xrd-eu.cern.ch//atlas/dq2/user/ilijav/HCtest/user.ilijav.HCtest.1/group.test.hc.NTUP_SMWZ.root /dev/null • Received redirection to [xrdfed02.cern.ch:1094]. (UK redirector) • Received redirection to [srm.glite.ecdf.ed.ac.uk:11000]. (ECDF @ Edinburgh) • GoToAnotherServer: Error connecting to [srm.glite.ecdf.ed.ac.uk:11000]. • Received redirection to [xrdfed02.cern.ch:1094]. • Received redirection to [srm.glite.ecdf.ed.ac.uk:11000]. • … • Set a timeout to detect abnormal situation like this