1 / 35

Fast Communication for Multi – Core SOPC

Fast Communication for Multi – Core SOPC. Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab. Spring 2007. End Project Presentation. Supervisor: Evgeny Fiksman Performed by: Moshe Bino Alex Tikh. One year project.

chaela
Download Presentation

Fast Communication for Multi – Core SOPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FastCommunication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2007 End Project Presentation Supervisor: Evgeny Fiksman Performed by: Moshe Bino Alex Tikh One year project

  2. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  3. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  4. Problem statement Single CPU is reaching its technological limits, e.g. heat dissipation and cost/power ratio. Thus parallel computing evolved, utilizing multi core processor paradigm. Three major inter-communication techniques are: Shared memory Remote procedure calls. Message passing – (MPI) Introduction

  5. Project Overview • Mesh topology NoC • Routing nodes • Leaf processor cores • MPI logically defines clusters • Comm • Rank • Cores amount is limited only by chip resources Introduction

  6. The following components are to be implemented: Quad core system. NoC router (4 ports) and infrastructure for fast communication in multi-core system. Chosen MPI functions written in C. Software application demonstrating the advantages of a parallel system (written in C). Project goals Introduction

  7. Block diagram • Multi – core IP’s • Bi - directional link FSL • Local memory • Main core connected to I/O • Multi - clock domain • System on programmable chip implemented on FPGA Introduction

  8. Constrains: FPGA (V2P) maximum clock frequency 400MHz. MicroBlaze (MB) core maximum frequency 100MHz. Router maximum frequency 200MHz Processors Memory size 64kbyte. (code + data). Processor to FSL access time - 3 clock cycles. Maximum FSL buffer depth is 128 - equals 0.5kbyte. Interrupt handle time - 20 clock cycles (no interrupts nesting). Measurement system: Router works at 100MHz frequency. MB works at 25MHz frequency. FSL depth is 64 word - 0.25kbyte Router is designed for relatively small messages – max. 1kbyte due to processors & FPGA chip memory size. System specifications Introduction

  9. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  10. Router Implementation Hardware Design

  11. CROSS – BAR • Two main units: • Permission Unit • Port FSM • Time limited • Round Robin arbiter • Port to Port & broadcasting • Smart Connectivity • R – R • R - Core • Modular design Hardware Design

  12. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  13. MicroBlaze#2 Application Network Data Physical System software Layers • Application • MPI functions interface • Network • hardware independent implementation • Data • relies on message structure • Physical • designed for FSL bus MicroBlaze#1 MPI Application Network Data Physical Design modularity in hardware and software Software design

  14. MPI Functions set Software design

  15. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  16. Debug – Simulation • The Test Bench write messages to the FSL pipe (MB output side) and reads messages from the pipe (MB input side). • Signals can also be viewed in ModelSim Debug Process

  17. Debug process – Real time • Software debug using 2 MB system. • Debug mainly done using printf function for plotting results to monitor trough UART. • Hardware debug was using chip scope application and LEDs for indication Debug Process

  18. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  19. Example applicationMatrix - Vector multiplication Results Typical example of highly parallel application. Root processor broadcasts Vector. Selected Matrix Row sent by root to each processor. Each processor computes and returns its result. Computed results are combined into a vector by root processor.

  20. Example applicationMatrix - Vector multiplication Router MPI MPI MPI MPI Results

  21. Matrix - Vector multiplication - results • For simple operations single processor is preferred Results * Time = ticks/clk frequency

  22. Matrix - Vector multiplication - results • When the operation takes more time than the send and receive time the router becomes efficient Results

  23. Router statistics • Transfer time theoretical limit is 8 clock cycle time • The limit is calculated from: • 3 clks for putfsl. • 2 clks for router read & write. • 3 clks for getfsl • total = 8 clks Results

  24. Router statistics • Bcast takes more time then send. • The slope value (8) comes from transfer limit time. Results

  25. Router statistics Results

  26. Table of Content Introduction Hardware Design Software Design Debug Process Results Future Research Table of Content

  27. Future Directions • Improve router performance ~400Mhz • Expand network to more than 4 processors Future research

  28. QUESTIONS ?

  29. Message payload • The Header consist of the fields: • The Tail consist of the fields: *Empty fields where left to allow network and functionality extensions. Introduction

  30. Example 1 • At each time slot part of the message is send to it’s destination as long as the destination port is not busy. • When Port is busy the next requesting port is service (no delay). Hardware Design

  31. Example 2 • If one port has no data (port 2) other ports are serviced by order. Hardware Design

  32. Example 3 • Handling BCAST command and port arbitrating while 2 ports has the same destination. Hardware Design

  33. MPI_Send: composes header and tail, and sends it with the message (body) Sending the message Software Design

  34. Receiving the message Interrupt Vector: receives incoming messages, and stores them in suitable linked list Software Design

  35. Return received message MPI_Recv: message details received from user. Looks for this message in linked list of already received messages Software Design

More Related