1 / 32

Paper Review I Coarse Grained Reconfigurable Arrays

Paper Review I Coarse Grained Reconfigurable Arrays. Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008. References. Link 2: Chapter 2: Coarse-Grained Reconfigurable Architectures

tolla
Download Presentation

Paper Review I Coarse Grained Reconfigurable Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Paper Review ICoarse Grained Reconfigurable Arrays Presented By: Matthew Mayhew I.D.# 0234815 ENG*6530 Tues, June, 10, 2008

  2. References • Link 2: Chapter 2: Coarse-Grained Reconfigurable Architectures • Parizi, H.; Niktash, A.; Bagherzadeh, N,; Kurdahi, F.; MorphoSys: A Coarse Grain Reconfigurable Architecture for Multimedia Applications, Euro-Par 2002 Parallel Processing. 8th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2400), 2002, p 844-8

  3. References Cont. • Sadasivam, M.; Hong, S.; Application Specific Coarse-Grained FPGA for Processing Element in Real-Time Parallel Particle Filters, Proceedings 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications, 2003, p 116-19 • Veredes, F,; Scheppler, M.; Moffat, W.; Mei, B.; Custom Implementation of the Coarse-Grained Reconfigurable ADRES Architecture for Multimedia Purposes, Proceedings. 2005 International Conference on Field Programmable Logic and Applications (IEEE Cat. No.05EX1155), 2005, p 106-11

  4. Overview • Introduction • Basic Concepts • Classifications • General Architectures • Research Architectures • MorphoSys • Architecture for Dynamically Reconfigurable Embedded System (ADRES) • Coarse Grained FPGA for parallel partical processing • Project Summary

  5. Problems with Fine Grained FPGAs • Wide datapaths constructed of bit level elements to allow for processing on individual bits. • Requires a high volume of reconfiguration data for the processing elements and routing switches. • Difficulty in mapping from high level languages due to the difference in granularity.

  6. Coarse Grained Architectures • Constructed from multi-bit wide datapaths and complex operators. • Wide datapath allows for the implementation of complex operators, reducing routing overhead • Connections in CGRA processing elements have widths of multiple bits. As such, each connection takes more area, but fewer connections are needed.

  7. Classification of Architectures • Coarse Grained Architectures are classified based on three criteria: • Interconnect Structure • Mesh-based • Linear Array • Crossbar • Datapath Width • Tradeoff between flexibility and area consumption • Reconfiguration Method • Static • Dynamic

  8. Basic Architectures: Mesh-Based • Processing Elements arranged in a rectangular array with horizontal and vertical connections.

  9. Mesh-Based Continued • Structure allows for good parallelism and use of communication resources. • Requires good tools for Place and Route. • Arrangement encourages Nearest Neighbour (NN) links, but generally has lines for longer connections.

  10. Basic Architectures: Linear Array • Processing elements arranged in a linear fashion with neighbours generally connected. • Generally designed for the implementation of pipelined processes.

  11. Basic Architectures: Crossbar • All Processing Elements connected by a matrix of switches, allowing for arbitrary connections. • Simple routing task. • Due to implementation restrictions, reduced crossbar more common with clusters connected.

  12. MorphoSys • Designed to handle multimedia applications. • Due to varied tasks and a large amount of input/output data, ASIC solutions are generally expensive to develop and GPPs ineffecient. • Currently in version M2, with research ongoing.

  13. System Architecture • The system level architecture of the MorphoSys system is shown below: Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F.

  14. RC Cell Architecture • The layout of an individual reconfigurable cell is shown below: Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F.

  15. Benefits of MorphoSys • Combination of both fine and coarse grained reconfigurable elements allow for customization and optimization depending on the application. • Memory structure designed to accommodate the high demand for data movement in multimedia applications.

  16. Evaluation • Tested with several operations common in multimedia and DSP applications. • Tested against dedicated DSP boards. Parizi, H.; Niktash, A.; Bagherzadeh, N.; Kurdahi, F.

  17. ADRES • Designed to achieve specified performance and power consumption targets for portable wireless media applications. • Test application for the architecture was an H.264/AVC decoder. • The ADRES architecture consists of a VLIW processor coupled with an array of coarse grained processing cells for acceleration.

  18. ADRES Architecture • VLIW processor optimized for load/store and control operations. • The accelerator component optimized for data-flow with branching supported. • Each reconfigurable cell contains a local register file, allowing for iterative data processing and data delay. • Each reconfigurable cell can communicate with all cells in its row and column, as well as neighbouring cells within its quadrant.

  19. System Level View • When running in acceleration mode, an 8x8 array can be formed by configuring the VLIW elements. Veredes, F.; Scheppler, M.; Moffat, W.; Mei, B.

  20. ADRES Reconfigurable Cell • While the configuration memory is assumed to be static during execution, dynamic reconfiguration is possible using a pointer. Veredes, F.; Scheppler, M.; Moffat, W.; Mei, B.

  21. Performance and Implementation • ADRES found to be 88% faster overall in a full decoding cycle than a standard VLIW processor. • Layout study performed using 0.13 μm technology standard cells. • Each reconfigurable cell consumes approximately 0.196 mm2. • Configuration memory accounts for around 50% of a cell, with 83% of the area in the full implementation used for various storage elements.

  22. Parallel Particle Filter Processor • Particle filters are used in non-linear problems where the goal is to track or detect dynamic signals. • Target application of designed system is the real-time tracking of a ball-bearing, where the goal is to determine the coordinates and velocity of the target using a given input angle. • Need to generate new particles, determine appropriate weights, and resample.

  23. Operations • Both the generation of new particles and determining the weights are performed using processing elements. • This involves the calculation of w(m), which is the weight of a particle, and f(m), which is determined by the application.

  24. System Level Architecture • Consists of both parallel and sequential data flow, with a buffer to synchronize their behaviour. Sadasivam, M.; Hong, S.

  25. Sequential Flow Reconfigurable Slice (SFRS) • Responsible for the calculation of f(m), with direct access to the buffer unit. Sadasivam, M.; Hong, S.

  26. Parallel Flow ReconfigurableSlice (PFRS) • Handles updating, creating, and outputting the particles. Sadasivam, M.; Hong, S.

  27. Reconfiguration • The architecture can be altered by changing: • The way in which particles are generated • The way in which particles update • The output method • The update of particles can be altered by reconfiguring the CORDIC unit used in the calculation of f(m), which also stores needed constants and MUX controls. • The control unit is used to control the interconnects in the SFRS to implement the desired function.

  28. Performance • Tested against both a DSP processor and a general purpose FPGA. • It should be noted that the authors reported problems in terms of having enough logic elements to map all the required PEs on the general purpose FPGA. • The results are shown in the table below for the calculation times of both f(m) and w(m).

  29. Conclusions • Coarse Grained reconfigurable architectures generally used in either calculation or I/O heavy applications. • Not single best design, with the architecture layout highly dependent on design goals. • Performance generally favourable when compared to dedicated processors and general purpose FPGAs.

  30. Project • Goal: Implementation of the Advanced Encryption Standard (AES) algorithm using VHDL. • Secondary Goal: Implement the algorithm in such a way as to reduce the area consumption and computation time.

  31. Progress • Algorithm examined in terms of where parallelism and alternative implementations can be considered. • While individual rounds must be performed sequentially, “blocks” of data within a given operation can be acted upon in parallel. • Implementation of the S-box and MixColumns operations crucial to a good application.

  32. Thank you for your time. Questions?

More Related