1 / 10

Thermal Management of Heterogeneous Data Centers

Thermal Management of Heterogeneous Data Centers. David Bendit System Administrator Mars Space Flight Facility Arizona State University. Introduction. Mars Space Flight Facility Located in the Moeur Building (West of the MU) Medium-sized server room Roughly 12’x25’

maja
Download Presentation

Thermal Management of Heterogeneous Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thermal Management of Heterogeneous Data Centers David Bendit System Administrator Mars Space Flight Facility Arizona State University

  2. Introduction • Mars Space Flight Facility • Located in the Moeur Building (West of the MU) • Medium-sized server room • Roughly 12’x25’ • Over 100 physical servers • Small cluster • 26 physical machines, each running 4 Xen instances • Total of 104 nodes optimally

  3. Motivation • Chiller had a bad habit of dying over weekends • Come in Monday morning to a 90°F+ server room • IMPACT research has studied the HPCI Cluster • HCPI is more homogenous, dedicated to cluster processing • Proper cooling setup • More standard layout • MSFF server room is sub-optimal setup, which doesn’t have as much research

  4. Problem Statement • What qualities are important in selecting a thermal-aware job scheduling algorithm for heterogeneous data centers • Based on those qualities, which algorithms fare the best • How does the heterogeneity of the data center affect the performance of the algorithms chosen?

  5. Approach • Physical • Temperature sensors throughout the server room • Wireless, single-hop sensor network • Logging of cluster jobs and as many physical server temps as feasible • Virtual • Simulation of various scheduling strategies using FloVENT • Will take into account server room layout and air flow

  6. Difficulties • Physical • Originally supposed to work with another team to create temperature sensor network • My contact in that team ended up dropping the class • Logging cluster job traces • Limited experience with new scheduling software at MSFF • Logging individual server temperatures • Heterogeneity of hardware means that not all supports reporting temperatures over SNMP or other means

  7. Difficulties, cont. • Virtual • Server room floor plans • We’ve placed things around where they fit, without more pre-planning than general ideas • Needed to create this, and left out important details (namely, air vents and the like) • Simulation software • License is expired and needs to be renewed before any simulations can be run

  8. Difficulties, cont. • General • MSFF is NASA funded • As a US Government facility, we’re not able to let foreign nationals into the server room • Frequent changes to the server room itself • As data storage requirements, etc. change, we change equipment, so things can change week-to-week • Results may not be implementable • Because of very specific user requirements, useful changes may not be implemented anyway

  9. Moving Forward • Project will continue past the length of this course • We will correlate cluster activity with machine and ambient temperatures • End goal • Reduce overall strain on the chiller • My (rather pessimistic) prediction • Because of large number of other computing resources in the server room, and always-on nature of our setup, cluster has negligible impact on overall temperature • Need more wide-ranging changes for real temperature reduction

  10. Questions?

More Related