1 / 19

AMULET3i - asynchronous SoC

AMULET3i - asynchronous SoC. Steve Furber - sfurber@cs.man.ac.uk Agenda: AMULET3i Design tools Future problems. AMULET3. a third generation asynchronous ARM performance comparable with ARM9 radically new internal organisation based on reorder buffer Harvard core, unified I/D memory

ramona
Download Presentation

AMULET3i - asynchronous SoC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMULET3i - asynchronous SoC Steve Furber - sfurber@cs.man.ac.uk • Agenda: • AMULET3i • Design tools • Future problems

  2. AMULET3 • a third generation asynchronous ARM • performance comparable with ARM9 • radically new internal organisation • based on reorder buffer • Harvard core, unified I/D memory • under development within the OMI ATOM project • first application as as part of a telecommunications controller

  3. AMULET3i SoC organisation

  4. AMULET3i - physical layout

  5. AMULET3 core organisation • Harvard core • forward from reorder buffer • out-of-order completion • in-order register update • aborts handled at writeback

  6. AMULET3H local bus RAM • segmented memory • I & D ports arbitrate at each block • quad-word I & D line buffers

  7. AMULET3 tools • LARD • behavioural modelling tool for async design • 10x designer productivity vs Asim • Petrify • much enhanced FORCAGE descendant • can handle wider range of circuits • Balsa • synthesis tool used for DMA controller

  8. Tools - LARD • Language for Asynchronous Research and Development • parallel processes with communication primitives • extensive data types • modelling of elapsed time • used to model AMULET3 • available from AMULET web site

  9. Tools - LARD • Features • time view • block view • HLL debug • test generation • co-simulation • Platforms • UNIX/Linux

  10. Tools - BALSA • Synthesis system for asynchronous circuits • similar to Philips ‘Tangram’ • used for AMULET3H DMA controller • direct HLL to netlist compilation • syntax directed translation • peephole optimisation

  11. Tools - BALSA

  12. Tools - Petrify • Petri Net modelling tool • for low-level asynchronous circuits • speed-independent synthesis • technology mapping • very powerful • can be tricky to use • extensively used to design AMULET3 modules

  13. AMULET3 validation • workstations now powerful enough to run ARM validation suite under TimeMill • around 8 CPU-weeks total • testing full functionality now very hard • very complex system-on-chip • design aimed at high performance • timing margins much reduced • validation complex and uncertain

  14. AMULET3 - problems • high performance target • timing margins must be small • timing is hard to verify • very dependent on accurate extraction, models • modelling tools are imperfect • e.g. crosstalk • bus wire delay 1.5ns +/- 1ns crosstalk • careful layout gives 0.9ns +/- 0.15ns • how can we be sure such factors are OK?

  15. The Future • timing accuracy is getting harder • wire delays will become more significant • crosstalk will get worse • on-chip transistor variance will increase • higher speeds will lead to higher noise • will delay-matching be viable? • alternatives are dual-rail or other DI codes • incur significant area and power overheads

  16. Gate vs (2mm) wire delays, ps

  17. Alternatives to bundled data • Delay-insensitive codes • timing encoded in data • dual-rail encoding • 100% area overhead c.f. bundled data • significant power cost • e.g. NCL from Theseus • deal just announced with Motorola • use conventional synthesis tools • timing closure ceases to be an issue

  18. Alternatives to bundled data • Delay-insensitive codes • N-of-M codes • 3-of-6 code • 50% area overhead • 3 transitions to send 4 bits • 2-of-7 code • 75% area overhead • 2 transitions to send 4 bits • well-suited to inter-chip communication • may suit on-chip buses

  19. Conclusions • complex async design is feasible • standard tools are just about survivable • additional tools improve productivity • ideal design flow: • LARD-like specification • formal verification of high-level properties • automated synthesis onto module library • timing closure is the major problem • may ultimately rule out bundled data

More Related