html5-img
1 / 40

What to do and why in microprocessor research

What to do and why in microprocessor research. Mario Nemirovsky University of California, Santa Cruz XStream Logic, Inc. mario@ieee.org. Agenda. Microcontroller era PC era Post-PC era Directions and challenges. Microcontroller era. Intel 4004 => 8008 => 8080 => 8085 Motorola

ivo
Download Presentation

What to do and why in microprocessor research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What to do and why in microprocessor research Mario Nemirovsky University of California, Santa Cruz XStream Logic, Inc. mario@ieee.org

  2. Agenda • Microcontroller era • PC era • Post-PC era • Directions and challenges Mario Nemirovsky

  3. Microcontroller era • Intel • 4004 => 8008 => 8080 => 8085 • Motorola • 6800 => 68000 Mario Nemirovsky

  4. Real time systems • First microprocessors used as controllers • Late 70’s to early 80’s • Delco Electronics (General Motors) #1 microprocessor user and producer • TIO first real time multithreaded microprocessor • Motorola dominated the market • CISC needed, why? Mario Nemirovsky

  5. The PC era • 68K used for workstations • Apollo and the MMU issue • Apple and the A trap • Intel 80x86 • IBM introduce the PC Mario Nemirovsky

  6. RISC vs. CISC • RISC values • simplicity • fast design cycles • small area • CICS values • small footprint • fewer number of fetch • small register file? Mario Nemirovsky

  7. General purpose microprocessors • Performance • For for past 10 years avg. annual performance growth averages 1.59! • Architectural directions • Exploiting instruction level parallelism • Memory hierarchies • Special purpose micros not needed! Mario Nemirovsky

  8. Hardware Techniques Pipelining Dynamic issue (i. e. superscalar) Dynamic multistreaming (SMT) Dynamic scheduling Dynamic branch prediction Dynamic disambiguation Dynamic “super”speculation Dynamic recompilation Software Techniques Static scheduling Static issue (i. e. VLIW) Static branch prediction Alias/ pointer analysis Static speculation Today’s Uniprocessor Mario Nemirovsky

  9. Limit of ILP Mario Nemirovsky

  10. IPC on a real machine Mario Nemirovsky

  11. Multistreamed Superscalar Processor • Exploit thread level parallelism • Interleaved execution of instructions from distinct threads • Multiple hardware contexts (streams) • Improve performance by making better use of processor resources Mario Nemirovsky

  12. Multistreaming Work • The beginning: the CDC6600 - J.E.Thornton • Early 80’s: the HEP – B.Smith • Mid 80’s: Delco TIO – M.Nemirovsky • Late 80’s: UCSB DISC – M.Nemirovsky • Early 90’s: UCSB MSP (SMT) – M.Nemirovsky & M. Serrano • In ISCA91 “Simultaneous Instruction Issuing” – H.Hirata • In HICSS94 “Performance Estimation of Multistreamed , Superscalar Procesors” – W.Yamamoto & M.Nemirovsky et al • In ISCA95 “Simultaneous Multithreading” – D.Tullsen • In PACT95 “Increasing superscalar performance through multistreaming” – W.Yamamoto & M.Nemirovsky Mario Nemirovsky

  13. Multistreamed, Superscalar Processor (PACT’95, Yamamoto & Nemirovsky) Mario Nemirovsky

  14. Performance Regions • Linear • Performance limited by workload parallelism • Saturation • Performance limited by machine parallelism Mario Nemirovsky

  15. Limits on Performance • Machine Parallelism (mp) • Determined by the functional unit configuration and the dynamic instruction mix • Example: 2 integer, 60%; 1 memory, 40% • Workload Parallelism • Characteristic of a program • Compiler dependence Mario Nemirovsky

  16. Functional Unit Effect on Performance (Ph.D. Dissertation (UCSB), March’94, M.Serrano) Mario Nemirovsky

  17. Execution Profiles 1 stream 2 streams (PACT’95, Yamamoto & Nemirovsky) Mario Nemirovsky

  18. Execution Profiles 3 streams 4 streams (PACT’95, Yamamoto & Nemirovsky) Mario Nemirovsky

  19. Caches • Caches are shared among the streams • Miss rate increases due interstream conflicts • Individual thread performance decreases • Overall performance increases • Bus Utilization Increases • Increase is the product of the speed up and the miss rate increase • Design to maximize speed up while minimizing miss rate increase Mario Nemirovsky

  20. Extrinsic Misses • Extrinsic misses make up a significant portion of the miss rate direct mapped-16 byte line (MTEAC’98, Nemirovsky & Yamamoto) Mario Nemirovsky

  21. The new era • Even if the large gains in performance in last 15 years can be continued (which may be very hard), there are new applications that are growing even faster. • New applications other than PC centric • Larger diversity of requirements Mario Nemirovsky

  22. “Post-Desktop Era” ? • Information appliances • Multiple computers per person • Internet and web centric • Access to services is “one of” the killer app • 3-D is “one of” the killer app, …… Mario Nemirovsky

  23. Applications Fueling the Growth of the Internet 10000 Streaming Video • Video on Demand Telephony • Voice over IP (DSL) 1000 Transactions • E-commerce (v.90 access at home) Throughput (MB/s) 100 Graphics • Web browsing (direct connections at work) 10 Text • E-mail, ftp, news (low-speed connections) 1 1990 1994 1997 2000 2003 Mario Nemirovsky

  24. Future • Larger growth outside desktop PC • New performance metrics • “DoomMarks” vs. SPECmarks, MPPs vs. MFLOPs • Wider spectrum of requirements • Performance • Power • TTM • Reliability • Real Time • Cost Mario Nemirovsky

  25. Opportunities • Application specific processors vs. GP • “Multiple” high-end CPU designs • Low-power architectures • Better CAD support • Fault-tolerant systems • Real Time architectures • Integration - System on a chip Mario Nemirovsky

  26. Conclusions • Processors will have new constrains • “Multiple” general-purpose processors • Stream data • Light threads • New interfaces • Cache friendliness • Internet and communication will dominate • Reliability Mario Nemirovsky

  27. Multithreading Work in 87 • Multiprocessor Systems • Fine grained instruction interleaving (HEP) • Coarse grained instruction interleaving (Sparcle) • Embedded Real Time Control • GM engine controller The TIO has up to 33 streams actives simultaneously, each stream controls a spark, fuel, and other function per cylinder Mario Nemirovsky

  28. Multistreaming Work in 90 • Multiprocessor Systems • Fine grained instruction interleaving (TERA) • Coarse grained instruction interleaving (Sparcle) • Embedded Real Time Control • GM engine controller • Fine grained, dynamic instruction interleaving (DISC) DISC uses dynamic interleaving where the instruction dispatch algorithm dynamically reallocates throughput to the unblocked streams. This algorithm eliminates data and control hazards without degrading single stream latency. Mario Nemirovsky

  29. Multistreaming Work in 92 • Multiprocessor Systems • Fine grained instruction interleaving (TERA) • Coarse grained instruction interleaving (Sparcle) • Embedded Real Time Control • GM engine controller • Fine grained, dynamic instruction interleaving (DISC) • Multistreamed, Superscalar Processors • Fine grained, dynamic instruction interleaving. • Each stream is a logical superscalar processor • Multiple functional unit design Mario Nemirovsky

  30. Multistream Performance 1 stream Mario Nemirovsky

  31. Multistream Performance 2 streams Mario Nemirovsky

  32. Multistream Performance 3 streams Mario Nemirovsky

  33. Multistream Performance 4 streams Mario Nemirovsky

  34. Multistream Performance • Performance Bounds • Workload parallelism: 1-2 streams • Machine parallelism: 3-4 streams • Data cache miss rate increased by 18% when moving from a single stream to 2 streams Mario Nemirovsky

  35. Interference • Associativity reduces interference • Increasing capacity reduces interference for large associative caches Mario Nemirovsky

  36. Interference • Increasing the line size increases interference Mario Nemirovsky

  37. Interference • Increasing the number of streams increases interference 2 way set associativity Mario Nemirovsky

  38. Overall Miss Rate • Increasing the line size: • decreases the miss rate for large caches • increases the miss rate for small caches • Multistreaming favors smaller line sizes Mario Nemirovsky

  39. Individual Thread Performance • Round Robin Scheduling • Streams share the throughput equally • Individual thread execution time increased by 13% for 2 streams • Priority Scheduling • Streams are assigned a priority • Individual thread execution time increased by 2% for 2 streams • Lower priority stream executed at 73% of single stream performance Mario Nemirovsky

  40. Better ways to exploit parallelism? • – Key to improving architectural gain/ transistor • – More SW & algorithmic involvement may be required! •  Think about high level forms of parallelism • – More explicit, but gentle slope is crucial •  Can speculative multithreading help? •  More evolutionary: microarchitecture level • – Reduce importance of binary compatibility? • – Multi- purpose ISAs rather than general- purpose? •  Single architecture adopts to different applications • – Possible directions •  More static pipeline structures (LIW, VLIW) •  Easier adoption of multiprocessing? •  “Configurable” architectures (multiuse vs. g. p.) Mario Nemirovsky

More Related