1 / 27

Hosting Large-scale e-Infrastructure Resources

Hosting Large-scale e-Infrastructure Resources. Mark Leese mark.leese@stfc.ac.uk. Contents. Speed dating introduction to STFC Idyllic life, pre- e-Infrastructure Sample STFC hosted e-Infrastructure projects RAL network re-design Other issues to consider. STFC.

wren
Download Presentation

Hosting Large-scale e-Infrastructure Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hosting Large-scalee-Infrastructure Resources Mark Leese mark.leese@stfc.ac.uk

  2. Contents • Speed dating introduction to STFC • Idyllic life, pre-e-Infrastructure • Sample STFC hosted e-Infrastructure projects • RAL network re-design • Other issues to consider

  3. STFC • One of seven publicly funded UK Research Councils • Formed from 2007 merger of CCLRC and PPARC • STFC does a lot, including… • awarding research, project & PhD grants • providing access to international science facilities through its funded membership of bodies like CERN • shares it expertise in areas such as materials and space science with academic and industrial communities • …but it is mainly recognised for hosting large scale scientific facilities, inc. High Performance Computing (HPC) resources

  4. Harwell Oxford Campus • STFC major shareholder in Diamond Light Source • Electron beam accelerated to near light speed within ring • Resulting light (X-Ray, UV or IR) interacts with samples being studied • ISIS • ‘super-microscope’ employing neutron beams to study materials at atomic level

  5. Harwell Oxford Campus • STFC’s Rutherford Appleton Lab is part of Harwell Oxford Science and Innovation Campus with UKAEA, and commercial campus management company • Co-locate hi-tech start-ups and multi-national organisations alongside established scientific and technical expertise • Similar arrangement at Daresbury in Cheshire • Both within George Osbourne Enterprise Zones: • Reduced business rates • Government support for roll out of super fast broadband

  6. PreviousExperiences

  7. Large Hadron Collider • LHC at CERN • Search for elementary but hypothetical Higgs boson particle • Two proton (hadron) beams • Four experiments (particle detectors) • Detector electronics generate data during collisions 16.5 miles CMS ALICE LHCb ATLAS

  8. LHC and Tier-1 • After initial processing, the four experiments generated 13 PetaBytes of data in 2010 (> 15m GB or 3.3m single layer DVDs) • In last 12 months, Tier-1 received ≈ 6 PBs from CERN and other Tier-1s • GridPP contributes equivalent of 20,000 PCs

  9. UK Tier-1 at RAL Internal Distribution Security Firewall Router A • Individual Tier-1 hosts route data to routers A or UKLight as appropriate • Config pushed out with Quattor Grid/cluster management tool Front Door ISP Primary Janet Site Access Router RAL Site Backup Tier-1 to Tier-2s (universities) PetaBytes?!? “Normal” data 10 Gbps lightpath CERN LHC OPN Tier-1 UK Light Router Optical Private Network LHC data Backup • Access Control Lists of IP address on SAR, UKLight router and/or hosts replaces firewall security • As Tier-2 (universities) network capabilities increase, so must RAL’s (102030 Gbps) Tier-0 & other Tier-1s

  10. LOFAR • LOw Frequency Array • World's largest and most sensitive radio telescope • Thousands of simple dipole antennas, 38 European arrays • 1st UK array opened at Chilbolton, Sept 2010 • 7 PetaBytes a year raw data generated (> 1.5m DVDs) • Data transmitted in real-time to IBM BlueGene/P super computer at Uni of Groningen • Data processed & combined in software to produce images of the radio sky

  11. LOFAR • 10 Gbps Janet Lightpath • Janet  GÉANT SURFnet • Big leap from FedEx’ing data tapes or drives • 2011 RCUK e-IAG “Southampton and UCL make specific reference ... quicker to courier 1TB of data on a portable drive” • Funded by LOFAR-UK • cf. LHC: centralised not distributed processing • Expected to pioneer approach for other projects, e.g. Square Kilometre Array

  12. Sample STFCe-InfrastructureProjects

  13. ICE-CSE • International Centre of Excellence for Computational Science and Engineering • Was going to be Hartree Centre, now DFSC • STFC Daresbury Laboratory, Cheshire • Partnership with IBM • Mission to provide HPC resources and develop software • DL previously hosted HPCx, big academic HPC before HECToR • IBM BlueGene/Q supercomputer • 114,688 processor cores, 1.4 Petaflops peak performance • Partner IBM’s tests were first time a Petaflop application has been run in the UK (one thousand trillion calculations per second) • 13th in this year’s TOP500 worldwide list • Rest of Europe appears five times in Top 10 • DiRAC and HECToR (Edinburgh) 20th and 32nd

  14. ICE-CSE • DL network upgraded to support up to 8 * 10 Gbpslightpaths to current regional Janet deliverer, Net North West, in Liverpool and Manchester • Same optical fibres, different colours of light: • 10G JANET IP service (primary) • 10G JANET IP service (secondary) • 10G DEISA (consortium of European supercomputers) • 10G HECToR (Edinburgh) • 10G ISIC (STFC-RAL) More expected as part of IBM-STFC collaboration • Feasible because NNW rents its own dark (unlit) fibre network • NNW ‘simply’ change the optical equipment on each end of the dark fibre • Key aim is for machine and expertise to be available to commercial companies • How? Over Janet? • A Strategic Vision for UK e-Infrastructure estimates that 1,400 companies could make use of HPC, with 300 quite likely to do so • So even if some instead go for the commercial “cloud” option...

  15. JASMIN & CEMS • Joint Analysis System Meeting Infrastructure Needs • JASMIN and CEMS funded by BIS through NERC, and UKSA and ISIC respectively • Compute and storage cluster for the climate and earth system modelling community

  16. Big compute and storage cluster 4.6 PetaBytes fast disc storage JASMIN will talk internally to other STFC resources JASMIN will talk to its satellite systems 150 TB JASMIN will talk to the Nederlands, the MET Office & Edinburgh over UKLight compute + 500 TB 150 TB

  17. CEMS in the ISIC • Climate and Environmental Monitoring from Space • Essentially JASMIN for commercial users • Promote use of ‘space’ data and technology within new market sectors • Four consortia already won funding from public funded ‘Space for Growth’ competition (run by UKSA, TSB and SEEDA) • Hosted in International Space Innovation Centre • A ‘not-for-profit’ formed by industrials, academia and government. • Part of UK’s Space Innovation and Growth Strategy to grow the sector’s £turnover • ISIC is STFC ‘Partner Organisation’ in terms of Janet Eligibility Policy • So... Janet-BCE (Business and Community Engagement) for network access related to academic and ISIC partners • Commercial ISP for network access related to commercial customers • As the industrial collaboration agenda is pushed, this needs to be controlled and applicable elsewhere in STFC

  18. Janet BT Janet & Janet-BCE traffic Commercial traffic • JASMIN and CEMS connected at 10 Gbps… • …but no Janet access for CEMS via JASMIN • Keeping Janet ‘permitted’ traffic as separate BCE VLAN allows tighter control • Customers will access CEMS on different IP addresses depending on who they are (academia, partners, commercials) • This could be enforced RAL Infrastructure JASMIN Rtr 10 Gbps fibre No CEMS traffic permitted Commercial customers VLAN Janet-BCE VLAN ISIC Sw Rtr 10 Gbps fibre CEMS

  19. RAL Network Re-Design& Other Issues

  20. RAL Network Re-Design ISIS Site Access Router Janet Two main aims: • Resilience: Reduce serial paths and single points of failure. • Scalability and flexibility: Remove need for special cases. Make adding bandwidth and adding ‘clouds’ (e.g. Tier-1 or tenants) a repeatable process with known costs. The Outside World Firewall Internal Distribution Router A RAL Site Admin RAL PoP UKLight Router “Normal” data CERN LHC OPN JASMIN LHC data Tier-1 Tier-1

  21. Site Access & Distribution Internal Distribution Visitors External Connectivity Security Site Campus Project,Facility,Dept Rtr Implicit trust relationship = bypass firewall RAL PoP: Campus Access & Distribution Virtual firewall Internal Site Distribution Janet Primary Rtr 1 Sw 1 Rest of RAL site Rtr A Backup CERN LHC OPN Rtr 2 Sw 2 Primary Commer-cial ISP Rtr Tier-1 Tenants

  22. Rtr 1 & 2, Sw1 & 2 • Lots of 10 Gigs: • clouds and new providers can be readily added • bandwidth readily added to existing clouds • clouds can be dual connected Front: 48 ports 1/10 GbE (SFP+) Back: 4 ports 40 GbE (QSFP+)

  23. Backup to London RAL Site Resilience Primary to Reading 500 ft 100m

  24. User Education • Belief that you can plug a node or cluster into “the network” and be immediately firing lots of data all over the world is a fallacy • Over provisioning is not a complete solution • Having invested £m’s elsewhere, most network problems that do arise are within the last mile: campus network  individual devices  applications • On the end systems... • Network Interface Card • Hard disc • TCP configuration • Poor cabling • Does your application use parallel TCP streams? • What protocols does your application use for data transfer (GridFTP, HTTP...)? • Know what to do on your end systems • Know what questions to ask of others

  25. User Support • 2010 example: CMIP5 - RAL Space sharing environmental data with Lawrence Livermore (West coast US) and DKRZ (Germany) • ESNet, California  GÉANT, London 800 Mbps • ESNet, California  RAL Space 30 Mbps • RAL Space  DKRZ, Germany 40Mbps • So RAL is the problem right? Not necessarily... • DKRZ, Germany  RAL Space up to 700Mbps • Involved six distinct parties: RAL Space, STFC Networking, Janet, DANTE, ESNet, LLNL • Difficult, although the experiences probably fed into the aforementioned JASMIN • Tildesley’sStrategic Vision for UK e-Infrastructure talks of “the additional effort to provide the skills and training needed for advice and guidance on matching end-systems to high-capacity networks”

  26. I’ll do anything for a free lunch • Access Control and Identity Management • During DTI’s e-Science programme access to resources was often controlled using personal X.509 certificates • Is that scalable? • Will you run or pay for a PKI? • Resource providers may want to try Moonshot • extension of eduroam technology • users of e-Infrastructure resources authenticated with user credentials held by their employer • Will the Janet Brokerage be applicable to HPC e-Infrastructure resources?

  27. Conclusions From the STFC networking perspective: • Adding bandwidth should be repeatable process with known costs • Networking is now a core utility, just like electricity: plan for resilience on many levels • Plan for commercial interaction • In all the excitement don’t forget security • e-Infrastructure funding is paying for capital investments - be aware of the recurrent costs

More Related