260 likes | 363 Views
Holistic, Energy Efficient Design @ Cardiff. Going Green Can Save Money. Dr Hugh Beedie CTO ARCCA & INSRV. Introduction. The Context Drivers to Go Green Where does all the power go ? Before the equipment In the equipment What should we do about it ?
E N D
Holistic, Energy Efficient Design@ Cardiff Going Green Can Save Money Dr Hugh Beedie CTO ARCCA & INSRV
Introduction • The Context • Drivers to Go Green • Where does all the power go ? • Before the equipment • In the equipment • What should we do about it ? • What Cardiff University is doing about it
The Context (1) • Cardiff University receives £3M grant to purchase a new supercomputer • A new room is required to house it, with appropriate power, cooling, etc • 2 tenders • Data Centre construction • Supercomputer
The Context (2) • INSRV Sustainability Mission: To minimise CU’s IT environmental impact and to be a leader in delivering sustainable information services. • Some current & recent initiatives: • University INSRV Windows XP image default settings • Condor – saving energy, etc compared to a dedicated supercomputer • ARCCA & INSRV new Data Centre • PC Power saving project – standby 15 mins after logout (being implemented this session)
Drivers – Why do Green IT? • Increasing demand for CPU & Storage • Lack of Space • Lack of Power • Increasing energy bills (oil prices doubled) • Enhancing the Reputation of Cardiff University & attracting better students • Sustainable IT • Because we should (for the Planet )
Congress Report Aug 2007 • US Data Centre electricity demand doubled 2000-2006 • Trends toward 20kW+ per rack • Large scope for efficiency improvement • Obvious – more efficiency at each stage • Holistic approach necessary – facility and component improvements • Less obvious – virtualisation (up to 5X)
Where does all the power go? (1) “Up to 50% is used before getting to the Server” Report to US Congress Aug 2007 Loss = £50,000 p.a. for every 100kW supplied to the room
Where does all the power go? (3) • How ? • Power Conversion - before it gets to your room, you lose in the HV-> LV transformer Efficiency=98% not 95% • Return On Investment (ROI)? • New installation, ROI = 1 month • Replacement, ROI = 1 year • Lifetime of investment = 20+ yrs !!!!!
Where does all the power go? (4) How? • Cooling infrastructure • Typical markup 75% • Lowest markup 25-30% ? • Est ROI 2-3 years (lifetime 8 years)
Where does all the power go? (4) How? • Backup power (UPS) (% load vs % efficiency) • Efficiency = 80-95% • Est. ROI for new installation - <1year • Replacement not so good, UPS’ life 3-5 yrs only ?
Data Centre consumption 100% Cooling 80% Powerdelivery Cumulative power 40% Loads Where does it go? – Bull View
25.5% Room cooling system70W 7.3% UPS +PDU 20W 5.5% Server fans 15W 18.2% PSU 50W 7.3% Voltage Regulators20W 36.4%Load CPU, Memory,Drives , I/O100W Total 275W Source: Intel Corp. Where does it go? – Intel View
Options for Cardiff (1) • Carry on as before • Dual core HPC solutions • Wait for quad core • Saves on Flops/watt • Saves on infrastructure (fewer network ports) • Saves on management (fewer nodes) • Saves on space • Saves on power
Options for Cardiff (2) • High density solution • Needs specialist cooling over 8kW per rack • Carry on as before (6-8kW per rack) • Probable higher TCO • Low density solution (typically) • BT – free air cooling • Allow wider operating temp range – warranty issues ? • Not applicable here (no space)
What did Cardiff do? (1) • Ran 2 projects • HPC equipment • Environment • TCO as key evaluation criterion • Plus need to measure and report on usage • Problems • Finger pointing (strong project mgt) • Scheduling (keep everyone in the loop)
What did Cardiff do? (2) • Bought Bull R422 servers for HPC • 80W Quad core Harpertown • 2 dual socket, quad core servers in 1U – common PSU • Larger fans (not on CPU) • Other project in same room • IBM Bladecentres • Some pizza boxes
Back Up Power APC 160kW Symmetra UPS • Full and half load efficiency 92% • Scaleable & Modular – could grow as we grew • Strong environment management options • Integrated with cooling units • SNMP • Bypass capability • Bought 2 (not fully populated) • 1 for compute nodes • 1 for management and another project Enhanced existing standby generator
Cooling Inside the Room APC Netshelter Airflow APC Inline RC units Provides residual cooling to the room Resilient against loss of an RC unit Cools hot air without mixing with cold air Front Servers InRow CoolingUnit InRow Cooling Unit
Cooling – Outside the Room 3 Airedale 120kW Chillers Ultima Compact Free Cool 120kW Quiet model Variable speed fans N+1 arrangement
Free-cooling vs Mechanical cooling System operated on mechanical cooling only(38% of Year) System operated on 100% Free-Cooling (12% of Year) System operated on partial Free-Cooling (50% of Year) Cooling Load -7oC 3.5oC 12.5oC Mechanical (compressor) cooling Ambient Temp. Free-cooling
Cost Savings Summary • Low loss transformer • £10k p.a. • UPS • £20k p.a. • Cooling • £50k p.a. estimated • Servers • 80W part - £20k p.a. • Quad core – same power but twice the ‘grunt’
Lessons learned from SRIF3 HPC Procurement - Summary • Strong project management essential • IT and Estates liaison essential but difficult • Good Supplier relationship essential • Major savings possible • Infrastructure (power & cooling) • Servers (density and efficiency)