1 / 1

Hardware Design, Synthesis, and Verification of a Multicore Communication API

Services provided by modern computer systems Computation oriented Fast, low power cost Communication oriented Slow, high power cost. Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan {meakin, ganesh}@cs.utah.edu

sachi
Download Presentation

Hardware Design, Synthesis, and Verification of a Multicore Communication API

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Services provided by modern computer systems Computation oriented Fast, low power cost Communication oriented Slow, high power cost Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan{meakin, ganesh}@cs.utah.edu University of Utah School of Computing Multicore Communication API MIPS Core Data-path Project Objectives Hardware Verification • Multicore Association Communication API (MCAPI) • Lightweight messaging API designed for embedded multicore systems • Implementation • Messages and packet channels use pointers to shared memory • Scalar channels copy data • Uses in-line assembly code • Application of IBM's Sixthsense semi-formal verification tool to complex multicore hardware • Promises simulator usability with MUCH higher coverage • Ability to verify large designs due to non-exhaustive state space exploration Simulation Custom On-Chip Network Synthesis Formal Verification • Objectives of this project • Research and implement efficient means of performing on-chip communication • Evaluate the impact of instruction set extensions enabling explicit data transfer • Apply these to a modern communication API • Study the use of semi-formal HW verification tools to verify realistic multicore HW • Workload driven synthesis of NoC given a model of an MCAPI target application • Paper under review for HiPEAC '10 • Algorithmic objectives • Generate custom topology to minimize average hops / flit for application • Synthesize deadlock free routing tables based on shortest path • Given approximate node sizes find a physical placement such that average wire distance is minimized Semi-Formal Verification • Cache coherence protocol verification at RTL • Can SXS find bugs not found by simulation? • Further application to pipeline control • Work in progress... 8-Core MIPS System-on-Chip • 8 processor tiles on a Xilinx Virtex5 FPGA • 16-bit MIPS cores (6-stage pipelines) • Private 2KB instruction and 2KB data caches • Shared 4KB slice of L2 data cache • Network interface unit • NUCA • MSI Directory based cache coherence • Various I/O interfaces Implementing Inter-core Communication Future Work • Evaluation of SXS and other tools as applied to multicore RTL descriptions • Extensive benchmarking of MCAPI implementation and interconnect technology • Research additional applications of proposed ISA extension in parallel programming methods • Research hardware mechanisms for increasing observability of multicore processors • Deterministic replay • Physical transport layer • Asynchronous network-on-chip • Dual networks; one for user, one for cache controllers • MIPS instruction set extension • Enables explicit data transfer • Reduces some hardware complexity More Information Wiki page with link to read-only SVN checkout: www.cs.utah.edu/formal_verification/mediawiki -Under “MCAPI Hardware Implementation” Ben Meakin's web-page: www.cs.utah.edu/~meakin Multicore Association web-page: www.multicore-association.com • Results highly encouraging • From baseline, our algorithms achieved for specific application (> 16 cores) • ~50% reduction in avg. hops / flit • ~50% reduction in avg. wire distance / flit • ~17% increase in throughput • Comparable hardware cost • Cache Architecture • Direct mapped, 8 words per block • L2 physically distributed/logically shared (NUCA) • L1 private • MSI directory coherence protocol • Write invalidate policy • Simplified form of modern architecture • Performed at least as well as baseline for general purpose • Better scalability

More Related