1 / 20

IHEP Computing Center Site Report

IHEP Computing Center Site Report. Shi, Jingyan (shi.jingyan@ihep.ac.cn) Computing Center, IHEP. IHEP at a Glance. ~1000 staffs, 2/3 scientists and engineers The largest fundamental research center in China with research fields: Experimental particle physics Theoretical particle physics

lsaxton
Download Presentation

IHEP Computing Center Site Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IHEP Computing Center Site Report Shi, Jingyan (shi.jingyan@ihep.ac.cn) Computing Center, IHEP

  2. IHEP at a Glance • ~1000 staffs, 2/3 scientists and engineers • The largest fundamental research center in China with research fields: • Experimental particle physics • Theoretical particle physics • Astrophysics and cosmic rays • Accelerator technology and applications • Synchrotron radiation and applications • Nuclear analysis technique • Computing and Network application Shi,Jingyan/CC/IHEP 2020/1/6 - 2

  3. Computing Environment in IHEP Shi,Jingyan/CC/IHEP 2020/1/6 - 3

  4. PC farm built with blades Force10 E1200 Central Switch Computing Resources • ~8000 cpu-cores • will reach 10000 cores in two months • SL5.5 (64 bit) for WLCG,BES-III, YBJ,DaYa Bay • SL4.5 (32 bit) for BES-III • Will be upgraded in two months • Blade system, IBM/HP/Dell • Blade links with GigE/IB • Chassis links to central switch with 10GigE Shi,Jingyan/CC/IHEP 2020/1/6 - 4

  5. Batch System • Torque 2.5.5 + maui 3.2.6p21 • Merge torque & AFS • Client / Server architecture • Fake tokens dispatched by the server • Active MQ message passing

  6. File system - Lustre • 3 MDSs, 31 OSSs, 300+OSTs, 800 client nodes, 100 million files • Lustre Version: 1.8.5 ( upgraded in July) • Capacity: 1.7PB (slight change since May) • All login clients has been upgraded to 64bit, get fewer crashes of login nodes • IHEP is considering binding Lustre with CASTOR 1.7 using the HSM function provide by Lustre 2.x

  7. HSM Deployment • Hardware • Two IBM 3584 tape libraries • ~5800 slots,with 26 LTO-4 tape drivers • 10 tape servers and 10 disk servers with 200TB disk pool • Software • Customized version based on CASTOR 1.7.1.5 • Support the new types of hardware • Optimize the performance of tape read and write operation • Stager was re-written • Network • 10Gbps link between disk servers and tape servers Shi,Jingyan/CC/IHEP 2020/1/6 - 7

  8. Network connection USA EUR. Others ASGC GLORIAD 10G 2.5G TEIN3 Hong Kong Beijing Tsinghua 10G EDU.CN 2.5G 1G Beijing CSTNet 155M IPv4 10G IPv6 YBJ Daya Bay IHEP Shen Zhen 45M Shi,Jingyan/CC/IHEP 2020/1/6 - 8

  9. Work Nodes Tape Lib Document Management Web ContentManagement Iaas SaaS Storage Disk 5PB Tape Lib 8000 CPU/Cores 2PB+ Storage IaaS/PaaS/SaaS

  10. BEIJING-LCG2 Site Report

  11. BEIJING-LCG2 Site report

  12. Reliability and Availability

  13. dCacheMigration • dCache In IHEP • Total capacity was 320TB. • 3 head nodes and 8 pool nodes • dCache server version 1.9.5-25. • Migrated from pnfs to chimera • It takes about 40 hours to migrate.

  14. DPM Upgrade • DPM in IHEP • DPM version : 1.7.4 update to 1.8 • Total capacity was 320TB. • 1 head node and 8 pool nodes • Upgraded DPM Server OS from SL4 to SL5 • Reinstalled DPM and restore the dpns database

  15. CVMFS Deployed in IHEP Deployed cvmfs client on all the work nodes Setup a squid server as http proxy for the client Client version : 2.0.3-1 Supported VO : Atlas, CMS, BES

  16. Cooling System • Air Cooling system reached 75% of capacity • Cool air partition wasbuilt in 2009 and 2010 • New machines are coming

  17. Cooling System Monitoring • Blade racks are very hot due to the heavy duty of jobs

  18. Cooling system upgrade • Under going • Water cooling rack: for blade server racks running • Power Capacity: 800kW -> 800kW *2 • Power supply for one row (10 racks): • 100kW -> 270kW • 6 companies have entered the bid • Will be finished by the end of the year

  19. Conclusion • Farm works fine but more machines are coming • Cooling system needs to be upgraded as soon as possible • 32 bit OS (unstable) will be abandoned • More machines => new problems?

  20. Thank you!

More Related