1 / 24

The Thoroughly Modern Mainframe

The Thoroughly Modern Mainframe. Dr. Michael Salsburg NTSMF Users' Group Dec 9, 2002. Agenda. Large Scale WINTEL Servers Disruptive technology or trend? Scale Up or Scale Out ? A Workload-motivated discussion of SMP and CC-NUMA PCI-Based I/O Consolidation Emerging Technologies.

venus-pugh
Download Presentation

The Thoroughly Modern Mainframe

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Thoroughly Modern Mainframe Dr. Michael Salsburg NTSMF Users' Group Dec 9, 2002

  2. Agenda • Large Scale WINTEL Servers • Disruptive technology or trend? • Scale Up or Scale Out ? • A Workload-motivated discussion ofSMP and CC-NUMA • PCI-Based I/O • Consolidation • Emerging Technologies

  3. Server Industry Trends Source: IDC Intel will dominate server chip market Windows 2000 will be pervasive server OS

  4. The x440 CompetitionGartner Oct 2002

  5. A Comparison using Moore’s Law • Comparison of CPU Speeds / tpcM for 4x cpu WINTEL systems

  6. TPC-C Top 10

  7. Scale Up or Scale Out? • Two of the 3-tiers in current application architectures use scale-out for growth • Increase # of Web servers • Increase # of Application Servers • Database back end cannot be scaled out • Scale up is needed for large database applications • Scale out has some inherent down sides • additional administrative/management attention • Move “headroom” needed for heavy traffic

  8. SMP / NUMA Workload Discussion • As code executes on the processor, memory is referenced. This can be broken into three regions • High Locality of Reference • Memory is immediately re-referenced (> 95%) • Working Set – the set of addresses on which the software primarily focuses • Persistent Storage – addresses that are stored on physical devices

  9. Scale Out- SMP or NUMA? Workload Interference • When two processes are running on the same system, their memory references will interfere. • It is preferable to only interfere at the persistent storage level • Interference at higher levels can decrease cache efficiency and slow down processing, effectively reducing the CPU power

  10. SMP / NUMA SMP Topology • A bank of CPUs share a bank of Memory • Each CPU has a local cache to optimize high locality of reference • A cache miss has uniform latency time to get data from memory • “Dirty” memory references require fetching the updated memory from another CPU’s cache • The CPU can “stall” waiting for a memory reference

  11. ReferenceLevel Reference Level Percentage TimeUnits CPUCache CPU Cache 98.0% 1 MainMemory Main Memory 01.9% 100 PersistentStorage RemoteCache 200 00.1% PersistentStorage 10,000 SMP / NUMA Workload Discussion • Percentages of references based on TPC-C workload profile • Relative time units show orders of magnitude between cache hit and persistent storage

  12. SMP / NUMA NUMA (Non-Uniform Memory Access) • Overcome bus congestion and physical fabrication limitations found in a single bus architecture • Two memory latencies – near and far • The NUMA ratio is the ratio of far latency over near latency • Originally 30, now it is around 3

  13. SMP / NUMA Hybrid (Unisys ES7000) • Another level of cache is introduced • Memory accesses can be non-uniform when comparing Next Level Cache hits to memory references • Overcomes the fabrication/congestion problems of a single bus architecture

  14. PCI-Based I/O Cellular MultiProcessing (CMP) Architecture

  15. 6.4 GB 6.4 GB PCI-Based I/O SP2 Scalability Port 533MHz 4 PCI-Express 16X, or 8X@5Gb HyperTransport PCI-X 3 SP1 GB/sec Max Bus or per direction 266 2 1X 4X 8X 133 1 0.8 GB PCI 66 2001 & earlier 2002 2003 2004 2005

  16. Enterprise-Level Backup / Restore

  17. Enterprise-Level Backup / Restore • Complete recovery of a 2.5 terabyte database: • From tape, the database was recovered in only 88 minutes with a sustained throughput during restore of 2.2 TB/hr. • From the hardware snapshot, the same database was recovered in only 11 minutes. • Complete backup of a 2.5 terabyte database: • Backup to tape took only 68 minutes with minimal impact on online operations and sustained throughput of 2.6 TB/hr.

  18. Consolidation • "[Our] servers were multiplying like rabbits," says Jeff Smith, manager of corporate network services at La-Z-Boy Inc., a Monroe, Mich.-based residential furniture producer that just completed a Windows NT server consolidation project. "Our distributed environment was becoming more and more difficult to manage." • Thinning The Server RanksComputerworld Aug 26, 2002

  19. Consolidation • How do you stuff over 130 CPUs’ worth of workload into a 32x CPU system? • Veeerrrry carefully…… • Why are current server farms filled with under-utilized servers? • Web Hosting Sites • “New web servers are installed when Peak CPU utilization reaches above 35%.” • “Speed and reliability are very important to your web site. All of our servers are maintained at less than 15% CPU utilization. This ensures that your web site downloads as fast as possible!” 

  20. ConsolidationResponsive Consolidation • Which would you prefer – an average queue size of 0.2 on a 1x or a 32x system?

  21. ConsolidationBenefits • Simplified Management / Administration • Higher Utilization (less “headroom”) • Less Variability of Service • Less Overall CPU Overhead • Less software licenses

  22. Emerging CPU Technologies32x INTEL CPU TPC-C Results

  23. Itanium IIWhat’s so great about 64 bits? • For transaction processing, memory addressing is increased and therefore the amount of main memory increases • The top 5 TPC-C results were achieved using 64 bit computing • TPC-C is a large database application – this is a sweet spot for 64 bit commercial computing Bigger is DEFINITELY Better!!

More Related