1 / 16

Grid Research and Development

This research and development project focuses on the hardware and middleware applications for the Glasgow Grid, supporting the massive computing requirements of the LHC. With investments from various institutes and users, the project aims to optimize CPU and storage resources for efficient data processing. Key components include the ScotGRID processing nodes and storage, CDF equipment, and the GridPP Glasgow elements. The project timeline spans from 2002 to present, with ongoing research and operations in collaboration with the LCG and EDG initiatives.

Download Presentation

Grid Research and Development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Research and Development Hardware Middleware Applications Glasgow Grid R&D Investment

  2. LHC Computing at a Glance • The investment in LHC computing will be massive • LHC Review estimated 240MCHF • 80MCHF/y afterwards • These facilities will be distributed • Political as well as sociological and practical reasons Europe: 267 institutes, 4603 users Elsewhere: 208 institutes, 1632 users

  3. Rare Phenomena –Huge Background All interactions 9 orders of magnitude! The HIGGS

  4. CPU Requirements • Complex events • Large number of signals • “good” signals are covered with background • Many events • 109 events/experiment/year • 1- 25 MB/event raw data • several passes required • Need world-wide: 7*106 SPECint95 (3*108 MIPS)

  5. ScotGRID++ ~1 TIPS LHC Computing Challenge 1 TIPS = 25,000 SpecInt95 PC (1999) = ~15 SpecInt95 ~PBytes/sec Online System ~100 MBytes/sec Offline Farm~20 TIPS • One bunch crossing per 25 ns • 100 triggers per second • Each event is ~1 Mbyte ~100 MBytes/sec Tier 0 CERN Computer Centre >20 TIPS ~ Gbits/sec or Air Freight HPSS Tier 1 RAL Regional Centre US Regional Centre Italian Regional Centre French Regional Centre HPSS HPSS HPSS HPSS Tier 2 Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier 3 ~Gbits/sec Physicists work on analysis “channels” Glasgow has ~10 physicists working on one or more channels Data for these channels is cached by the Glasgow server Institute ~0.25TIPS Institute Institute Institute Physics data cache 100 - 1000 Mbits/sec Tier 4 Workstations

  6. Starting Point

  7. CPU Intensive Applications Numerically intensive simulations: • Minimal input and limited output data • ATLAS Monte Carlo (gg H bb) 182 sec/3.5 Mb event on 1000 MHz linux box Standalone physics applications: 1. Simulation of neutron/photon/electron interactions for 3D detector design 2. NLO QCD physics simulation Commodity processor approach.. • Connected via a Grid • General applicability

  8. ScotGRID • ScotGRID Processing nodes at Glasgow • 59 IBM X Series 330 dual 1 GHz Pentium III with 2GB memory • 2 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and dual ethernet • 3 IBM X Series 340 dual 1 GHz Pentium III with 2GB memory and 100 + 1000 Mbit/s ethernet • 1TB disk • LTO/Ultrium Tape Library • Cisco ethernet switches • ScotGRID Storage at Edinburgh • IBM X Series 370 PIII Xeon with 512 MB memory 32 x 512 MB RAM • 70 x 73.4 GB IBM FC Hot-Swap HDD • Griddev testrig at Glasgow • 4 x 233 MHz Pentium II Applications Hardware • CDF equipment at Glasgow • 8 x 700 MHz Xeon IBM xSeries 370 4 GB memory 1 TB disk

  9. Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 2002 2003 2004 2005 Timelines IBM equipment arrived at Edinburgh and Glasgow for Phase 1. Phase 0. Equipment is tested and set up in a basic configuration, networking the two sites. Phase 1. Prototyping of the integrated local computing fabric, with emphasis on scaling, reliability and resilience to errors. IBM equipment arrives at Edinburgh and Glasgow for Phase 2. LHC Global Grid TDR “50% prototype” (LCG-3) available grid ScotGRID ~ 300 CPUs + ~ 50 TBytes LCG-1 reliability and performance targets First Global Grid Service (LCG-1) available

  10. Glasgow within the Grid “Current” snapshot. Accessible via web.

  11. GridPP Glasgow Elements £17m 3-year project funded by PPARC, started 1/9/02 • CERN - LCG • (start-up phase) • funding for staff and hardware... Applications Operations • EDG - UK Contributions • Architecture • Testbed-1 • Network Monitoring • Certificates & Security • Storage Element • R-GMA • LCFG • MDS deployment • GridSite • SlashGrid • Spitfire • Optor • GridPP Monitor Page £1.99m £1.88m Tier - 1/A £3.66m CERN £5.67m DataGrid £3.78m www.gridpp.ac.uk • Applications (start-up phase) • BaBar • CDF+D0 (SAM) • ATLAS/LHCb • CMS • (ALICE) • UKQCD

  12. Sequential Access via Metadata SAM SAM system went into “Production Mode” for CDF on June 3, 2002. “Treat WAN as an abundant file transfer resource…” Rick St Denis, CDF Remote Analysis and WAN talk, Run II Computing Review, (June 4-6, 2002). “Grid” theme – require metadata to enable distributed resources e.g. CDF@Ggo to work coherently.

  13. HTTP + SSLRequest + client certificate Is certificate signedby a trusted CA? Has certificatebeen revoked? No No Yes Finddefault Role ok? Request a connection ID Spitfire - Security Mechanism Servlet Container SSLServletSocketFactory RDBMS Trusted CAs TrustManager Revoked Certsrepository Security Servlet ConnectionPool Authorization Module Does user specify role? Role repository Translator Servlet Role Connectionmappings Map role to connection id

  14. Optor – replica optimiser simulation • Simulate prototype Grid • Input site policies and experiment data files. • Introduce replication algorithm: • Files are always replicated to the local storage. • If necessary oldest files are deleted. • Even a basic replication algorithm significantly reduces network traffic and program running times. • New economics-based algorithms under investigation

  15. Prototypes real world... simulated World… Tools: Java Analysis Studio over TCP/IP Instantaneous CPU/Disk/Network Usage Scalable Architecture Individual Node Info.

  16. Summary Hardware Middleware • Long tradition in PPE Computing • Strong System Management Team • ~£1.2m investment over 3 years for hardware and middleware • New Grid Data Management Group - fundamental to Grid Development • University strategic investment - refurbishment and running costs • Software prototyping (GDM) and stress-testing (CDF) • Long term commitment (LHC era) • ATLAS/CDF/LHCb softwaredevelopment/deployment • Partnerships with Computing Science, Edinburgh, IBM… Applications ScotGRID

More Related