1 / 16

Thesis Proposal for the Degree of MS in Computing: Computer Engineering

Thesis Proposal for the Degree of MS in Computing: Computer Engineering. Hardware Design, Synthesis, and Verification of a Multicore Communication API Student: Ben Meakin Committee: Ganesh Gopalakrishnan Ken Stevens Rajeev Balasubramonian. Problems and Objectives.

beck
Download Presentation

Thesis Proposal for the Degree of MS in Computing: Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Thesis Proposal for the Degree of MS in Computing: Computer Engineering • Hardware Design, Synthesis, and Verification of a Multicore Communication API • Student: • Ben Meakin • Committee: • Ganesh Gopalakrishnan • Ken Stevens • Rajeev Balasubramonian

  2. Problems and Objectives Parallel Program Execution PE 1 PE 2 PE 3 PE 4 Time Computation Communication ʎ ʎ ʎ ʎ Primary Objective: Minimize ʎ ʎ ʎ ʎ ʎ

  3. Problems and Objectives • Implementation of various synchronization and communication mechanisms varies greatly across diverse embedded SoC platforms • Secondary Objective • Unify primary objective possibilities under a standard API

  4. Accomplished Work Papers • “Hardware Design, Synthesis, and Verification of a Multicore Communication API”, Techcon 2009. • Overview of MCAPI and its implementation • Introduces XUM: eXtensible Utah Multicore • “Workload Driven Synthesis of On-Chip Networks for Embedded Multicore Systems”, under review for DAC 2010. • Presents algorithms for custom topological synthesis of irregular NoC • Generates routing function and identifies heavily used links to assist in the design of a custom heterogeneous NoC Presentations • Poster presented at Multicore Expo 2009 • Poster and slides presented at SRC Techcon 2009

  5. Software MCAPI – embedded MPSoC message passing library GNU tool-chain compatible Simple multicore RTOS based on MCAPI XUM: eXtensible Utah Multicore • Hardware • 8 MIPS processing elements • 6-stage pipeline • 64KB I-cache (private) • 16KB L1 (private) • 256KB L2 (shared) • ISA extended with comm primitives • Custom NoC • Separate user & memory system networks • I/O over UART

  6. Hardware Architecture Ethernet SDRAM FLASH UART User Router Memory Router User Router Memory Router User Router Memory Router User Router Memory Router Network Interfaces Network Interfaces Network Interfaces Network Interfaces L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache Ack Rtr Ack Rtr MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache User Router Memory Router User Router User Router Memory Router User Router Asynchronous Router Memory Router User Router Memory Router Network Interface Network Interfaces Network Interfaces Network Interface Network Interfaces Network Interface Network Interface Network Interfaces L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache

  7. Hardware Architecture Ethernet SDRAM FLASH UART User Router Memory Router User Router Memory Router User Router Memory Router User Router Memory Router Network Interfaces Network Interfaces Network Interfaces Network Interfaces L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache Ack Rtr Ack Rtr MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache User Router Memory Router User Router User Router Memory Router User Router Asynchronous Router Memory Router User Router Memory Router Network Interface Network Interfaces Network Interfaces Network Interface Network Interfaces Network Interface Network Interface Network Interfaces L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache

  8. Hardware Architecture Ethernet SDRAM FLASH UART User Router Memory Router User Router Memory Router User Router Memory Router User Router Memory Router Network Interfaces Network Interfaces Network Interfaces Network Interfaces L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache Ack Rtr Ack Rtr MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache User Router Memory Router User Router User Router Memory Router User Router Asynchronous Router Memory Router User Router Memory Router Network Interface Network Interfaces Network Interfaces Network Interface Network Interfaces Network Interface Network Interface Network Interfaces L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache

  9. Hardware Architecture Ethernet SDRAM FLASH UART User Router Memory Router User Router Memory Router User Router Memory Router User Router Memory Router Network Interfaces Network Interfaces Network Interfaces Network Interfaces L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache L1 D-Cache L2 D-Cache Ack Rtr Ack Rtr MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache User Router Memory Router User Router User Router Memory Router User Router Asynchronous Router Memory Router User Router Memory Router Network Interface Network Interfaces Network Interfaces Network Interface Network Interfaces Network Interface Network Interface Network Interfaces L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache L1 D-Cache L1 D-Cache L2 D-Cache L2 D-Cache MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core MIPS Core L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache L1 I-Cache

  10. MIPS ISA Extension Instruction: Opcode: Funct: Type: Operation: Description: sndhd.b 0x14 0x00 R Net = [1,rs[4:0],000,funct[2:0],rt[4:0]] Send header of buffer type packet sndhd.p 0x14 0x01 R Send header of packet channel type sndhd.s 0x14 0x02 R Send header of 16-bit scalar channel sndhd.i 0x14 0x03 R Send header of 32-bit scalar channel sndhd.l 0x14 0x04 R Send header of 64-bit scalar channel sndack 0x1F 0x00 R Sends an acknowlege to the sender rechd 0x15 0x00 R R[rd] = netdata Receive the packet header sndw 0x16 0x00 R Net = [0,rs[15:0]] Send word (16-bit) of data recw 0x15 0x00 R R[rd] = netdata Receive word (16-bit) of data sndtl 0x1C 0x00 R Net = [10000010000000000] Send tail to close packet getid 0x1D 0x00 R R[rd] = nodeid Get local node identifier getfl 0x1E 0x00 I R[rt] = netflag[ZeroExtImm] Get specified operational flag

  11. MCAPI Implementation • Attributes • Used to define different memory domains and to control zero-copy data transfer • Connectionless Messages • Blocking and non-blocking • Flexible, but must arbitrate through network • Copy or pointer pass depending on endpoint attributes • Connected Channels (Packet) • Same as messages, but arbitration is done by executing sndhd beforehand • Faster, but network resources are used until channel is closed • Connected Channels (Scalar) • Same as packet channels except scalar values are always copied • Fastest access to small chunks of data

  12. NoC Synthesis • Input is a workload modeled as a set of pairs of communication endpoints and the desired bandwidth between them • Output is a topology that minimizes hops while meeting a given router radix bound • Also outputs routing tables • And total desired BW on individual links

  13. Proposed Work to Complete • Tasks • Benchmarking of XUM and MCAPI Implementation with Bare-metal Apps • Complete by Jan. 22nd • Write a paper on XUM (conferences?) • Complete by Jan. 29th • Sixthsense FV of MemCtrl and Router • Complete by Feb. 19th • Port MT RTOS to XUM • Complete by Mar. 5th • Write documentation/tutorials for XUM and other tools developed • Complete by graduation • Thesis defense sometime in April

  14. XUM Benchmarking • Video processing application • Each pixel equals the average of itself and the pixels surrounding it • Pipeline: Input, Processing, Output • Different memory domains mean that each stage operates on a different frame, cores in processing task use pointer passing to synchronize • Benchmark in terms of frame rate Mem Domain 2 Processing Task Messages used internally Cores 1 - 6 Mem Domain 1 Mem Domain 3 Input Task Core 0 Output Task Core 7 Msg Msg

  15. Operating System Kernel OS API int os_init(); int sleep(int usec); int thread_create(thrd_main_t func, int arg); int exit(int status); int thread_join(int tid, int* result); int sem_init(sem_t sem, int count); int get_tid(); int sem_signal(sem_t sem); int set_share(int share); int sem_wait(sem_t sem); • Master Kernel • Create threads • Identify execution device • Perform load balancing • Deliver TCB • Slave Kernel • Receive TCB • Schedule threads based on share/priority • Notify master when thread completes Thread Control Blocks tid *function cpu_share virtual_clock targetdevice_id ... Master Core 0 All Threads Ready Q Wait Q MCAPI Messages Slave Core 1 Slave Core 2 Slave Core 7 Ready Q Ready Q Ready Q Wait Q Wait Q Wait Q

  16. Conclusion • This thesis provides... • a body of work that will enable researchers to perform more interesting experiments related to multicore systems • the first hardware assisted implementation of MCAPI • a case study of Sixthsense • a case study of multicore embedded system design including complex hardware and software

More Related