1 / 23

DRI Grant impact at the smaller sites

DRI Grant impact at the smaller sites. Pete Gronbech September 2012 GridPP29 Oxford. Target Areas. Internal Cluster networking Cluster to JANET interconnect Resilience and redundancy. Cluster Networking. Most sites clusters have been interconnected at 1Gb/s

antranig
Download Presentation

DRI Grant impact at the smaller sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DRI Grant impact at the smaller sites Pete Gronbech September 2012 GridPP29 Oxford

  2. Target Areas • Internal Cluster networking • Cluster to JANET interconnect • Resilience and redundancy

  3. Cluster Networking • Most sites clusters have been interconnected at 1Gb/s • As storage servers size increased from ~20TB to 40TB and even larger 36 bay units with usable capacities of ~70TB the network links had to be increased to cope with the number of simultaneous connections from worker nodes • Many sites decided to use trunked or bonded 1Gb/s links. • Work on the basis of roughly one 1Gb/s link per 10TB • This no longer scales for the very large servers having 6 bonded links. • Cost of gigabit networking starts to look high when you have to divide the number of ports on a switch by 6. • 10G bit switch prices coming down.

  4. DRI Grant • Has allowed the sites to make the jump to 10Gbit switches in the cluster earlier than they would have planned to do so. • Has allowed some degree of future proofing by providing enough ports to cover expected cluster expansion over the next few years. • Replacing bonded gigabit with 10Gbit simplifies and tidies up the cabling and configuration. (Less to go wrong hopefully)

  5. Campus connectivity • Many Grid Clusters had 1 or 2G/bit connections to the campus WAN. • Many sites have used grant funding to install routers to allow connectivity to the back bone at 10Gbit. • If the campus backbone is made up of 10 Gbit links then the danger is that the grid cluster could saturate some of these links so blocking other traffic to the JANET connection. • Links have to be doubled up on the route to the campus router. • The JANET connection to the university has to be increased or the Grid link capped to allow both Grid traffic and Campus traffic to flow un hindered • The alternative is to install a by-pass link directly to the JANET router.

  6. Resilience • Where network upgrades were able to be purchased at a cost less than anticipated some funds where used to upgrade critical service nodes or infrastructure. • Storage server head nodes, caching servers, UPS or improved firewalls were items chosen by different institutes. • All sites were allocated some funds to purchase monitoring nodes. Originally intend to run Gridmon but the plan changed to use PerfSonar. • The end result is that the Grid clusters at the sites are in a much stronger position than before and will provide robust

  7. Careful planning required!

  8. Edinburgh

  9. Glasgow

  10. Lancaster That Network Upgrade... The mad scramble network uplift plan for Lancaster took a 3-pronged approach. 1. Upgrade & Shang-hai the University's back up link. 10G (mostly) just for us. 2. Increase connectivity to campus backbone & thus between the two "halves" of the grid cluster and the local HEP cluster. 3. Add capacity for 10G networking to our cluster using a pair of Z9000 core switches & half a dozen S4810 rack switches. 4. This free's up some of the current switches that can be retasked to improve the HEP cluster networking.

  11. Liverpool

  12. Oxford

  13. QMUL

  14. RHUL • Now have 2x1Gb/s links to Janet, trunked. Second link added 7th March. Could not be utilised until old 1Gb/s firewall replaced. • • Network upgraded from 8x Dell PC6248 (1Gb/s) stack, to 2xF10 S4810 10Gb/s spine with PC6248s attached as leaves by 2x10Gb/s to each F10. • • Old 1Gb/s firewall out of warranty/support, to be replaced soon with Juniper SRX650 (7 Gb/s max).

  15. Sheffield

  16. Sussex • 4 36-port Infiniband switches • Arranged IB switches in Fat Tree topology

  17. Common Themes • Well planned cluster networking, balanced and future proof • Vast improvement from ad hoc cost limited designs they replaced. • Have brought tangible benefits..

  18. FTS Transfer Rates • To Oxford • From Oxford

  19. Benefits • August 2012 Transfers of files to Oxford hitting 5Gbit rate cap for several hours.

  20. Performance Tuning / Future • Now need to concentrate on improving FTS transfers to the remaining slow sites • Good Monitoring required both locally and nationally • PerfSonar being installed across the sites (See next talk) • Work with JANET and site networking to increase JANET connectivity where required.

More Related